Community annotation of
The complete genome sequence of the cyanobacterium Synechocystis sp. PCC 6803 was determined in 1996 by the Kazusa DNA Research Institute. It was the first genome completely sequenced in Japan and fourth in the world following the three genomes (Haemophilus influenzae, Mycoplasma genitalium, and Methanococcus jannaschii) sequenced by TIGR. In comparison to other bacterial species, such as Escherichia coli and Bacillus subtilis, the Synechocystis genome contained a larger proportion of unknown genes, because cyanobacteria and their genes had not been well studied despite their importance in the evolution of life and the maintenance of biosphere. Thus, the availability of the genome sequence and a complete set of genes was a great boon to cyanobacteriologists, accelerating their research and resulting in a number of publications.
In 1999 Japanese cyanobacteriologists became accessible to the emerging technology of DNA microarrays thanks to the Genome Frontier Project supported by the then Science and Technology Agency (now part of the Ministry of Education, Culture, Sports, Science and Technology). The Cyanobacteria DNA Chip Consortium was formed by Satoshi Tabata (Kazusa), Norio Murata (Okazaki), and Minoru Kanehisa (Kyoto) for the analysis of gene expression profiles using Synechocystis DNA microarrays. The Consortium of about 25 members was instrumental in promoting collaborations between wet-lab biologists and bioinformaticians. The availability of DNA microarrays again greatly accelerated research on cyanobacteria.
However, accelerated research poses a database problem. When new gene functions are identified, the published result is stored in PubMed, but not in any sequence database. The lack of an up-to-date, well-annotated database is not limited to Synechocystis; it is a problem for the majority of the prokaryotic genomes thus far sequenced. The primary repositories of GenBank, EMBL, and DDBJ are not well maintained bacause the genome entries can only be updated by the sequencing teams who submitted the original data, but usually they are too busy doing next genomes. The providers of annotated databases such as SWISS-PROT and KEGG are simply unable to keep up with the rapidly increasing amount of data.
Perhaps, the only solution to the current database problem is to get the research community actively involved in the annotation process. The Bioinformatics Center of Kyoto University, the home of KEGG, developed a community database system named CYORF (formerly, SYORF). The members of the Cyanobacteria DNA Chip Consortium organized a netwrok of cyanobacteriologists who would be willing to do annotations. In the meantime, Kazusa provided an updated list of ORFs for the Synechocystis genome. The CYORF database release 1.0 is the product of the three-way collaboration among genome scientists at Kazusa, biologists in the DNA Chip Consortium, and bioinformaticians in Kyoto.
The first release of CYORF was made available by the Japanese community (*) reannotating about one thousand genes. However, we hope to maintain this community database as an infrastructure of research for the international community. We ask and welcome anyone interested to join the annotation of cyanobacterial genes, not just for Synechocystis but also for Anabaena and more. Please complete the registration form if you are interested in joining. Please also refer to the guidelines for the actual annotation works involved.
(*) Y. Fujita (Nagoya), Y. Hihara (Saitama), M. Ikeuchi (Tokyo), M. Ishiura (Nagoya), Y. Koike (Himeji), S. Maeda (Nagoya), K. Masamoto (Kumamoto), H. Nakamoto (Saitama), T. Ogawa (Nagoya), M. Ohmori (Tokyo), T. Omata (Nagoya), N. Sato (Saitama), M. Sugita (Nagoya), I. Suzuki (Okazaki), M. Tamoi (Kinki), A. Tanaka (Hokkaido), K. Tanaka (Tokyo)