Blue Gene 2002

 IBM and NeSC workshop on Protein Science

    National e-Science Centre, Edinburgh,  March 15-16 2002

   
   


Data Mining of the Arabidopsis thaliana Genome using GeneAtlas TM
Sunil Patel
Accelrys, 230-250 The Quorum, 23 Barnwell Road,
Cambridge CB5 8RE.

Recent advances in genome sequencing have created an immense opportunity to understand, describe and model whole living organisms. Several complete new genomes for various organisms including the Human Genome have been completed. However, functional and structural characterization of newly sequenced proteins is still problematic. It is estimated that the function of a protein can only be identified about 50% of the time using sequence based comparison methods alone (BLAST, FASTA, PROSITE). Currently it is easier to determine the function of a protein from its structure than sequence alone. Thus knowledge of a protein's structure plays a crucial role in the identification and characterization of a its function.
Initiatives using comparative protein structure modeling to generate structural data for all publicaly available sequenced genomes have been performed through the use of an automated pipeline referred as GeneAtlasTM. The automated pipeline allows creation of databases that includes 3D structure prediction as well as functional annotation of the genomes.
Analysis of such vast amounts of structural data can then be used to predict the function of other novel protein targets and thereby provide a completely new approach to the identification of function. Here we demonstrate showing some unique examples of the functional annotation of the Arabidopsis thaliana Genome using the GeneAtlasTM pipeline.
Clearly integration of the three-dimensional protein information for genomic data can lead to a much clearer understanding of a protein's function and further focus the mining of chemical databases for structure-based drug discovery. These data may be readily integrated into a drug discovery program.
It is therefore clear that in many cases knowledge of a protein's structure plays a crucial role in the identification and characterization of a its function.



SPONSORS
National e-Science Centre (NeSC)
The University of Edinburgh
SYMPOSIUM INFORMATION
IBM logo BlueGene logo NeSC logo