Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools.

Kyrpides NC, Ouzounis CA, Iliopoulos I, Vonstein V, Overbeek R.

The proliferation of genome sequence data has led to the development of a number of tools and strategies that facilitate computational analysis. These methods include the identification of motif patterns, membership of the query sequences in family databases, metabolic pathway involvement and gene proximity. We re-examined the completely sequenced genome of Thermotoga maritima by employing the combined use of the above methods. By analyzing all 1877 proteins encoded in this genome, we identified 193 cases of conflicting annotations (10%), of which 164 are new function predictions and 29 are amendments of previously proposed assignments. These results suggest that the combined use of existing computational tools can resolve inconclusive sequence similarities and significantly improve the prediction of protein function from genome sequence.

Nucleic Acids Res. 2000 Nov 15;28(22):4573-6.