Data Mining of the Arabidopsis thaliana Genome using GeneAtlas
TM
Sunil Patel
Accelrys, 230-250 The Quorum, 23 Barnwell Road,
Cambridge CB5 8RE.
Recent advances in genome sequencing have created an immense opportunity
to understand, describe and model whole living organisms. Several complete
new genomes for various organisms including the Human Genome have been completed.
However, functional and structural characterization of newly sequenced
proteins is still problematic. It is estimated that the function of a protein
can only be identified about 50% of the time using sequence based comparison
methods alone (BLAST, FASTA, PROSITE). Currently it is easier to determine
the function of a protein from its structure than sequence alone. Thus
knowledge of a protein's structure plays a crucial role in the identification
and characterization of a its function.
Initiatives using comparative protein structure modeling to generate structural
data for all publicaly available sequenced genomes have been performed through
the use of an automated pipeline referred as GeneAtlasTM. The automated
pipeline allows creation of databases that includes 3D structure prediction
as well as functional annotation of the genomes.
Analysis of such vast amounts of structural data can then be used to predict
the function of other novel protein targets and thereby provide a completely
new approach to the identification of function. Here we demonstrate showing
some unique examples of the functional annotation of the Arabidopsis thaliana
Genome using the GeneAtlasTM pipeline.
Clearly integration of the three-dimensional protein information for genomic
data can lead to a much clearer understanding of a protein's function and
further focus the mining of chemical databases for structure-based drug
discovery. These data may be readily integrated into a drug discovery program.
It is therefore clear that in many cases knowledge of a protein's structure
plays a crucial role in the identification and characterization of a its
function.
|
|