Growth of Escherichia coli MG1655 on LB medium: determining metabolic strategy with transcriptional microarrays.

Baev MV, Baev D, Radek AJ, Campbell JW.

Expression profiles of genes related to stress responses, substrate assimilation, acetate metabolism, and biosynthesis were obtained by monitoring growth of Escherichia coli MG1655 in Luria-Bertani (LB) medium with transcriptional microarrays. Superimposing gene expression profiles on a plot of specific growth rate demonstrates that the cells pass through four distinct physiological states during fermentation before entering stationary phase. Each of these states can be characterized by specific patterns of substrate utilization and cellular biosynthesis corresponding to the nutrient status of the medium. These data allow the growth phases of the classical microbial growth curve to be redefined in terms of the physiological states and environmental changes commonly occurring during bacterial growth in batch culture on LB medium.

Appl Microbiol Biotechnol. 2006 Jul;71(3):323-8. Epub 2006 Apr 28

Growth of Escherichia coli MG1655 on LB medium: monitoring utilization of sugars, alcohols, and organic acids with transcriptional microarrays.

Baev MV, Baev D, Radek AJ, Campbell JW.

Microorganisms respond to environmental changes by reprogramming their metabolism primarily through altered patterns of gene expression. DNA microarrays provide a tool for exploiting microorganisms as living sensors of their environment. The potential of DNA microarrays to reflect availability of nutrient components during fermentations on complex media was examined by monitoring global gene expression throughout batch cultivation of Escherichia coli MG1655 on Luria-Bertani (LB) medium. Gene expression profiles group into pathways that clearly demonstrate the metabolic changes occurring in the course of fermentation. Functional analysis of the gene expression related to metabolism of sugars, alcohols, and organic acids revealed that E. coli growing on LB medium switches from a sequential mode of substrate utilization to the simultaneous one in the course of the growth. Maltose and maltodextrins are the first of these substrates to support growth. Utilization of these nutrients associated with the highest growth rate of the culture was followed by simultaneous induction of enzymes involved in assimilation of a large group of other carbon sources including D-mannose, melibiose, D-galactose, L-fucose, L-rhamnose, D-mannitol, amino sugars, trehalose, L-arabinose, glycerol, and lactate. Availability of these nutrients to the cells was monitored by induction of corresponding transport and/or catabolic systems specific for each of the compounds.

Appl Microbiol Biotechnol. 2006 Jul;71(3):310-6. Epub 2006 Apr 21.

Identification of open reading frames unique to a select agent: Ralstonia solanacearum race 3 biovar 2.

Gabriel DW, Allen C, Schell M, Denny TP, Greenberg JT, Duan YP, Flores-Cruz Z, Huang Q, Clifford JM, Presting G, González ET, Reddy J, Elphinstone J, Swanson J, Yao J, Mulholland V, Liu L, Farmerie W, Patnaikuni M, Balogh B, Norman D, Alvarez A, Castillo JA, Jones J, Saddler G, Walunas T, Zhukov A, Mikhailova N.

An 8x draft genome was obtained and annotated for Ralstonia solanacearum race 3 biovar 2 (R3B2) strain UW551, a United States Department of Agriculture Select Agent isolated from geranium. The draft UW551 genome consisted of 80,169 reads resulting in 582 contigs containing 5,925,491 base pairs, with an average 64.5% GC content. Annotation revealed a predicted 4,454 protein coding open reading frames (ORFs), 43 tRNAs, and 5 rRNAs; 2,793 (or 62%) of the ORFs had a functional assignment. The UW551 genome was compared with the published genome of R. solanacearum race 1 biovar 3 tropical tomato strain GMI1000. The two phylogenetically distinct strains were at least 71% syntenic in gene organization. Most genes encoding known pathogenicity determinants, including predicted type III secreted effectors, appeared to be common to both strains. A total of 402 unique UW551 ORFs were identified, none of which had a best hit or >45% amino acid sequence identity with any R. solanacearum predicted protein; 16 had strong (E < 10(-13)) best hits to ORFs found in other bacterial plant pathogens. Many of the 402 unique genes were clustered, including 5 found in the hrp region and 38 contiguous, potential prophage genes. Conservation of some UW551 unique genes among R3B2 strains was examined by polymerase chain reaction among a group of 58 strains from different races and biovars, resulting in the identification of genes that may be potentially useful for diagnostic detection and identification of R3B2 strains. One 22-kb region that appears to be present in GMI1000 as a result of horizontal gene transfer is absent from UW551 and encodes enzymes that likely are essential for utilization of the three sugar alcohols that distinguish biovars 3 and 4 from biovars 1 and 2.

Mol Plant Microbe Interact. 2006 Jan;19(1):69-79.
http://dx.doi.org/10.1094/MPMI-19-0069

Gene array analysis of Yersinia enterocolitica FlhD and FlhC: regulation of enzymes affecting synthesis and degradation of carbamoylphosphate.

Kapatral V, Campbell JW, Minnich SA, Thomson NR, Matsumura P, Prüss BM.

This paper focuses on global gene regulation by FlhD/FlhC in enteric bacteria. Even though Yersinia enterocolitica FlhD/FlhC can complement an Escherichia coli flhDC mutant for motility, it is not known if the Y. enterocolitica FlhD/FlhC complex has an effect on metabolism similar to E. coli. To study metabolic gene regulation, a partial Yersinia enterocolitica 8081c microarray was constructed and the expression patterns of wild-type cells were compared to an flhDC mutant strain at 25 and 37 degrees C. The overlap between the E. coli and Y. enterocolitica FlhD/FlhC regulated genes was 25 %. Genes that were regulated at least fivefold by FlhD/FlhC in Y. enterocolitica are genes encoding urocanate hydratase (hutU), imidazolone propionase (hutI), carbamoylphosphate synthetase (carAB) and aspartate carbamoyltransferase (pyrBI). These enzymes are part of a pathway that is involved in the degradation of L-histidine to L-glutamate and eventually leads into purine/pyrimidine biosynthesis via carbamoylphosphate and carbamoylaspartate. A number of other genes were regulated at a lower rate. In two additional experiments, the expression of wild-type cells grown at 4 or 25 degrees C was compared to the same strain grown at 37 degrees C. The expression of the flagella master operon flhD was not affected by temperature, whereas the flagella-specific sigma factor fliA was highly expressed at 25 degrees C and reduced at 4 and 37 degrees C. Several other flagella genes, all of which are under the control of FliA, exhibited a similar temperature profile. These data are consistent with the hypothesis that temperature regulation of flagella genes might be mediated by the flagella-specific sigma factor FliA and not the flagella master regulator FlhD/FlhC.

Microbiology. 2004 Jul;150(Pt 7):2289-300.

Aerobic tryptophan degradation pathway in bacteria: novel kynurenine formamidase.

Kurnasov O1, Jablonski L, Polanuyer B, Dorrestein P, Begley T, Osterman A.

While a variety of chemical transformations related to the aerobic degradation of L-tryptophan (kynurenine pathway), and most of the genes and corresponding enzymes involved therein have been predominantly characterized in eukaryotes, relatively little was known about this pathway in bacteria. Using genome comparative analysis techniques we have predicted the existence of the three-step pathway of aerobic L-tryptophan degradation to anthranilate (anthranilate pathway) in several bacteria. Based on the chromosomal gene clustering analysis, we have identified a previously unknown gene encoding for kynurenine formamidase (EC 3.5.1.19) involved with the second step of the anthranilate pathway. This functional prediction was experimentally verified by cloning, expression and enzymatic characterization of recombinant kynurenine formamidase orthologs from Bacillus cereus, Pseudomonas aeruginosa and Ralstonia metallidurans. Experimental verification of the inferred anthranilate pathway was achieved by functional expression in Escherichia coli of the R. metallidurans putative kynBAU operon encoding three required enzymes: tryptophan 2,3-dioxygenase (gene kynA), kynurenine formamidase (gene kynB), and kynureninase (gene kynU). Our data provide the first experimental evidence of the connection between these genes (only one of which, kynU, was previously characterized) and L-tryptophan aerobic degradation pathway in bacteria.

FEMS Microbiol Lett. 2003 Oct 24;227(2):219-27.

Experimental determination and system level analysis of essential genes in Escherichia coli MG1655.

Gerdes SY, Scholle MD, Campbell JW, Balázsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, Bhattacharya A, Kapatral V, D'Souza M, Baev MV, Grechkin Y, Mseeh F, Fonstein MY, Overbeek R, Barabási AL, Oltvai ZN, Osterman AL.

Defining the gene products that play an essential role in an organism's functional repertoire is vital to understanding the system level organization of living cells. We used a genetic footprinting technique for a genome-wide assessment of genes required for robust aerobic growth of Escherichia coli in rich media. We identified 620 genes as essential and 3,126 genes as dispensable for growth under these conditions. Functional context analysis of these data allows individual functional assignments to be refined. Evolutionary context analysis demonstrates a significant tendency of essential E. coli genes to be preserved throughout the bacterial kingdom. Projection of these data over metabolic subsystems reveals topologic modules with essential and evolutionarily preserved enzymes with reduced capacity for error tolerance.

J Bacteriol. 2003 Oct;185(19):5673-84.

Missing genes in metabolic pathways: a comparative genomics approach.

Osterman A, Overbeek R.

The new techniques of genome context analysis--chromosomal gene clustering, protein fusions, occurrence profiles and shared regulatory sites--infer functional coupling between genes. In combination with metabolic reconstructions, these techniques can dramatically accelerate the pace of gene discovery.

Curr Opin Chem Biol. 2003 Apr;7(2):238-51.

The ERGO genome analysis and discovery system.

Overbeek R, Larsen N, Walunas T, D'Souza M, Pusch G, Selkov E Jr, Liolios K, Joukov V, Kaznadzey D, Anderson I, Bhattacharyya A, Burd H, Gardner W, Hanke P, Kapatral V, Mikhailova N, Vasieva O, Osterman A, Vonstein V, Fonstein M, Ivanova N, Kyrpides N.

The ERGO (http://ergo.integratedgenomics.com/ERGO/) genome analysis and discovery suite is an integration of biological data from genomics, biochemistry, high-throughput expression profiling, genetics and peer-reviewed journals to achieve a comprehensive analysis of genes and genomes. Far beyond any conventional systems that facilitate functional assignments, ERGO combines pattern-based analysis with comparative genomics by visualizing genes within the context of regulation, expression profiling, phylogenetic clusters, fusion events, networked cellular pathways and chromosomal neighborhoods of other functionally related genes. The result of this multifaceted approach is to provide an extensively curated database of the largest available integration of genomes, with a vast collection of reconstructed cellular pathways spanning all domains of life. Although access to ERGO is provided only under subscription, it is already widely used by the academic community. The current version of the system integrates 500 genomes from all domains of life in various levels of completion, 403 of which are available for subscription.

Nucleic Acids Res. 2003 Jan 1;31(1):164-71.

The ERGO genome analysis and discovery system.

Overbeek R, Larsen N, Walunas T, D'Souza M, Pusch G, Selkov E Jr, Liolios K, Joukov V, Kaznadzey D, Anderson I, Bhattacharyya A, Burd H, Gardner W, Hanke P, Kapatral V, Mikhailova N, Vasieva O, Osterman A, Vonstein V, Fonstein M, Ivanova N, Kyrpides N.

The ERGO (http://ergo.integratedgenomics.com/ERGO/) genome analysis and discovery suite is an integration of biological data from genomics, biochemistry, high-throughput expression profiling, genetics and peer-reviewed journals to achieve a comprehensive analysis of genes and genomes. Far beyond any conventional systems that facilitate functional assignments, ERGO combines pattern-based analysis with comparative genomics by visualizing genes within the context of regulation, expression profiling, phylogenetic clusters, fusion events, networked cellular pathways and chromosomal neighborhoods of other functionally related genes. The result of this multifaceted approach is to provide an extensively curated database of the largest available integration of genomes, with a vast collection of reconstructed cellular pathways spanning all domains of life. Although access to ERGO is provided only under subscription, it is already widely used by the academic community. The current version of the system integrates 500 genomes from all domains of life in various levels of completion, 403 of which are available for subscription.

Ribosylnicotinamide kinase domain of NadR protein: identification and implications in NAD biosynthesis.

Kurnasov OV, Polanuyer BM, Ananta S, Sloutsky R, Tam A, Gerdes SY, Osterman AL.

NAD is an indispensable redox cofactor in all organisms. Most of the genes required for NAD biosynthesis in various species are known. Ribosylnicotinamide kinase (RNK) was among the few unknown (missing) genes involved with NAD salvage and recycling pathways. Using a comparative genome analysis involving reconstruction of NAD metabolism from genomic data, we predicted and experimentally verified that bacterial RNK is encoded within the 3' region of the nadR gene. Based on these results and previous data, the full-size multifunctional NadR protein (as in Escherichia coli) is composed of (i) an N-terminal DNA-binding domain involved in the transcriptional regulation of NAD biosynthesis, (ii) a central nicotinamide mononucleotide adenylyltransferase (NMNAT) domain, and (iii) a C-terminal RNK domain. The RNK and NMNAT enzymatic activities of recombinant NadR proteins from Salmonella enterica serovar Typhimurium and Haemophilus influenzae were quantitatively characterized. We propose a model for the complete salvage pathway from exogenous N-ribosylnicotinamide to NAD which involves the concerted action of the PnuC transporter and NRK, followed by the NMNAT activity of the NadR protein. Both the pnuC and nadR genes were proven to be essential for the growth and survival of H. influenzae, thus implicating them as potential narrow-spectrum drug targets.

J Bacteriol. 2002 Dec;184(24):6906-17.

Bioinformatics classification and functional analysis of PhoH homologs.

Kazakov AE, Vassieva O, Gelfand MS, Osterman A, Overbeek R.

PhoH protein is a putative ATPase belonging to the phosphate regulon in Escherichia coli. EC-PhoH homologs are present in different organisms, but it is not clear if they are functionally related, besides nothing is known about their regulation. To distinguish true functional orthologs of EC-PhoH in different classes of bacteria and to identify their functional role in bacterial metabolic network we performed phylogenetic analysis of these proteins and comparative study of position and regulation of the related genes. Three groups of proteins were identified. Proteins of the first group (BS-PhoH orthologs) are present in most of bacteria and are proposed to be functionally linked to phospholipid metabolism and RNA modification. Proteins of the second group (BS-YlaK orthologs) are present in most of aerobes and Actinobacterial YlaK orthologs are shown to be members of a fatty acid beta-oxidation regulons. EC-PhoH orthologs are classified in a third group, specific for Enterobacteria. Functional role of PhoH homologs in the lipid and RNA metabolism and proposed interrelation of PhoH paralogs in one organism are discussed.

In Silico Biol. 2003;3(1-2):3-15. Epub 2002 Dec 30

Whole-genome comparative analysis of three phytopathogenic Xylella fastidiosa strains.

Bhattacharyya A, Stilwagen S, Ivanova N, D'Souza M, Bernal A, Lykidis A, Kapatral V, Anderson I, Larsen N, Los T, Reznik G, Selkov E Jr, Walunas TL, Feil H, Feil WS, Purcell A, Lassez JL, Hawkins TL, Haselkorn R, Overbeek R, Predki PF, Kyrpides NC.

Xylella fastidiosa (Xf) causes wilt disease in plants and is responsible for major economic and crop losses globally. Owing to the public importance of this phytopathogen we embarked on a comparative analysis of the complete genome of Xf pv citrus and the partial genomes of two recently sequenced strains of this species: Xf pv almond and Xf pv oleander, which cause leaf scorch in almond and oleander plants, respectively. We report a reanalysis of the previously sequenced Xf 9a5c (CVC, citrus) strain and the two "gapped" Xf genomes revealing ORFs encoding critical functions in pathogenicity and conjugative transfer. Second, a detailed whole-genome functional comparison was based on the three sequenced Xf strains, identifying the unique genes present in each strain, in addition to those shared between strains. Third, an "in silico" cellular reconstruction of these organisms was made, based on a comparison of their core functional subsystems that led to a characterization of their conjugative transfer machinery, identification of potential differences in their adhesion mechanisms, and highlighting of the absence of a classical quorum-sensing mechanism. This study demonstrates the effectiveness of comparative analysis strategies in the interpretation of genomes that are closely related.

Proc Natl Acad Sci U S A. 2002 Sep 17;99(19):12403-8. Epub 2002 Aug 30.

From genetic footprinting to antimicrobial drug targets: examples in cofactor biosynthetic pathways.

Gerdes SY, Scholle MD, D'Souza M, Bernal A, Baev MV, Farrell M, Kurnasov OV, Daugherty MD, Mseeh F, Polanuyer BM, Campbell JW, Anantha S, Shatalin KY, Chowdhury SA, Fonstein MY, Osterman AL.

Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. Comparative genomics provides new opportunities for finding optimal targets among previously unexplored cellular functions, based on an understanding of related biological processes in bacterial pathogens and their hosts. We describe an integrated approach to identification and prioritization of broad-spectrum drug targets. Our strategy is based on genetic footprinting in Escherichia coli followed by metabolic context analysis of essential gene orthologs in various species. Genes required for viability of E. coli in rich medium were identified on a whole-genome scale using the genetic footprinting technique. Potential target pathways were deduced from these data and compared with a panel of representative bacterial pathogens by using metabolic reconstructions from genomic data. Conserved and indispensable functions revealed by this analysis potentially represent broad-spectrum antibacterial targets. Further target prioritization involves comparison of the corresponding pathways and individual functions between pathogens and the human host. The most promising targets are validated by direct knockouts in model pathogens. The efficacy of this approach is illustrated using examples from metabolism of adenylate cofactors NAD(P), coenzyme A, and flavin adenine dinucleotide. Several drug targets within these pathways, including three distantly related adenylyltransferases (orthologs of the E. coli genes nadD, coaD, and ribF), are discussed in detail.

J Bacteriol. 2002 Aug;184(16):4555-72.

Microarray analysis of gene expression during bacteriophage T4 infection.

Luke K, Radek A, Liu X, Campbell J, Uzan M, Haselkorn R, Kogan Y.

Genomic microarrays were used to examine the complex temporal program of gene expression exhibited by bacteriophage T4 during the course of development. The microarray data confirm the existence of distinct early, middle, and late transcriptional classes during the bacteriophage replicative cycle. This approach allows assignment of previously uncharacterized genes to specific temporal classes. The genomic expression data verify many promoter assignments and predict the existence of previously unidentified promoters.

Virology. 2002 Aug 1;299(2):182-91.

Exact mapping of prokaryotic gene starts.

Baytaluk MV, Gelfand MS, Mironov AA.

It is known that while the programs used to find genes in prokaryotic genomes reliably map protein-coding regions, they often fail in the exact determination of gene starts. This problem is further aggravated by sequencing errors, most notably insertions and deletions leading to frame-shifts. Therefore, the exact mapping of gene starts and identification of frame-shifts are important problems of the computer-assisted functional analysis of newly sequenced genomes. Here we review methods of gene recognition and describe a new algorithm for correction of gene starts and identification of frame-shifts in prokaryotic genomes. The algorithm is based on the comparison of nucleotide and protein sequences of homologous genes from related organisms, using the assumption that the rate of evolutionary changes in protein-coding regions is lower than that in non-coding regions. A dynamic programming algorithm is used to align protein sequences obtained by formal translation of genomic nucleotide sequences. The possibility of frame-shifts is taken into account. The algorithm was tested on several groups of related organisms: gamma-proteobacteria, the Bacillus/Clostridium group, and three Pyrococcus genomes. The testing demonstrated that, dependent or a genome, 1-10 per cent of genes have incorrect starts or contain frame-shifts. The algorithm is implemented in the program package Orthologator-GeneCorrector.

Brief Bioinform. 2002 Jun;3(2):181-94.

Complete reconstitution of the human coenzyme A biosynthetic pathway via comparative genomics.

Daugherty M, Polanuyer B, Farrell M, Scholle M, Lykidis A, de Crécy-Lagard V, Osterman A.


The biosynthesis of CoA from pantothenic acid (vitamin B5) is an essential universal pathway in prokaryotes and eukaryotes. The CoA biosynthetic genes in bacteria have all recently been identified, but their counterparts in humans and other eukaryotes remained mostly unknown. Using comparative genomics, we have identified human genes encoding the last four enzymatic steps in CoA biosynthesis: phosphopantothenoylcysteine synthetase (EC ), phosphopantothenoylcysteine decarboxylase (EC ), phosphopantetheine adenylyltransferase (EC ), and dephospho-CoA kinase (EC ). Biological functions of these human genes were verified using a complementation system in Escherichia coli based on transposon mutagenesis. The individual human enzymes were overexpressed in E. coli and purified, and the corresponding activities were experimentally verified. In addition, the entire pathway from phosphopantothenate to CoA was successfully reconstituted in vitro using a mixture of purified recombinant enzymes. Human recombinant bifunctional phosphopantetheine adenylyltransferase/dephospho-CoA kinase was kinetically characterized. This enzyme was previously suggested as a point of CoA biosynthesis regulation, and we have observed significant differences in mRNA levels of the corresponding human gene in normal and tumor cells by Northern blot analysis.

J Biol Chem. 2002 Jun 14;277(24):21431-9. Epub 2002 Mar 28.

Genome sequence and analysis of the oral bacterium Fusobacterium nucleatum strain ATCC 25586.

Kapatral V, Anderson I, Ivanova N, Reznik G, Los T, Lykidis A, Bhattacharyya A, Bartman A, Gardner W, Grechkin G, Zhu L, Vasieva O, Chu L, Kogan Y, Chaga O, Goltsman E, Bernal A, Larsen N, D'Souza M, Walunas T, Pusch G, Haselkorn R, Fonstein M, Kyrpides N, Overbeek R.

We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H(2)S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth.

J Bacteriol. 2002 Apr;184(7):2005-18.

Genomes OnLine Database (GOLD): a monitor of genome projects world-wide.

Bernal A, Ear U, Kyrpides N.

GOLD is a comprehensive resource for accessing information related to completed and ongoing genome projects world-wide. The database currently provides information on 350 genome projects, of which 48 have been completely sequenced and their analysis published. GOLD was created in 1997 and since April 2000 it has been licensed to Integrated Genomics. The database is freely available through the URL: http://igweb.integratedgenomics.com/GOLD/.

Nucleic Acids Res. 2001 Jan 1;29(1):126-7.

Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools.

Kyrpides NC, Ouzounis CA, Iliopoulos I, Vonstein V, Overbeek R.

The proliferation of genome sequence data has led to the development of a number of tools and strategies that facilitate computational analysis. These methods include the identification of motif patterns, membership of the query sequences in family databases, metabolic pathway involvement and gene proximity. We re-examined the completely sequenced genome of Thermotoga maritima by employing the combined use of the above methods. By analyzing all 1877 proteins encoded in this genome, we identified 193 cases of conflicting annotations (10%), of which 164 are new function predictions and 29 are amendments of previously proposed assignments. These results suggest that the combined use of existing computational tools can resolve inconclusive sequence similarities and significantly improve the prediction of protein function from genome sequence.

Nucleic Acids Res. 2000 Nov 15;28(22):4573-6.

WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction.

Overbeek R, Larsen N, Pusch GD, D'Souza M, Selkov E Jr, Kyrpides N, Fonstein M, Maltsev N, Selkov E.

The WIT (What Is There) (http://wit.mcs.anl.gov/WIT2/) system has been designed to support comparative analysis of sequenced genomes and to generate metabolic reconstructions based on chromosomal sequences and metabolic modules from the EMP/MPW family of databases. This system contains data derived from about 40 completed or nearly completed genomes. Sequence homologies, various ORF-clustering algorithms, relative gene positions on the chromosome and placement of gene products in metabolic pathways (metabolic reconstruction) can be used for the assignment of gene functions and for development of overviews of genomes within WIT. The integration of a large number of phylogenetically diverse genomes in WIT facilitates the understanding of the physiology of different organisms.

Nucleic Acids Res. 2000 Jan 1;28(1):123-5.