Many new sequences are emerging from genomics projects and many new protein structures are now being determined using X–ray crystallography, nuclear magnetic resonance spectroscopy and cryo–electron microscopy. Without direct experimental evidence there is considerable difficulty in assigning function to proteins from their sequences or even from their proteins. This applies even to homologues of well–characterised proteins, because of the recruitment of similar proteins for divergent functions. Furthermore, correct classification of sequences, structures and functions often requires sensitivity to very delicate features. Computer programs can aid to some extent but cannot to the whole job reliably – again manual curation is essential. Proteomics studies on spatial and temporal protein expression patterns provide additional streams of data that require human interpretation to resolve fine details.
With the recognition of the importance of accurate database annotation and the requirement for individuals with particular constellations of skills to carry it out, annotators are emerging as specialists within the profession of bioinformatics. This book compiles information about annotation – its current status, what is required to improve it, what skills must be brought to bear on database curation and hence what is the proper training for annotators.
This book should be essential reading for all people working on biological databases, both biologists and computer scientists. It will be also be of interest to all users of such databases, inclduing molecular biologists, geneticists, protein chemists, clinicians and drug developers.
List of Contributors.
1. Annotation and Databases: Status and Prospects (M. Hoebeke, H. Chiapello, J.–F. Gibrat, Ph. Bessieres and J. Garnier).
I: THE DATABANKS.
2. Survey of Sequence Databases: Archival Projects (M. Magrane, M. Garcia–Pastor and R. Apweiler).
3. Survey of Sequence Databases: Derived Databases (M. Pruess, N. Mulder and R. Apweiler).
4. Databanks of Macromolecular Structure (H.J. Bernstein and F.C. Bernstein).
5. Gene Expression Databases (H. Parkinson).
II: THE BASIS OF ANNOTATION.
6. Taxonomy: a Moving Target for Sequence Data (M.I. Krichevsky).
7. Genomics and Proteomics: Design and Sources of Annotation (K. Mayer and G. Mannhaupt).
8. Annotation of Protein Sequences (W.C. Barker and C.H. Wu).
9. Issues in the Annotation of Protein Structures (G.J. Swaminathan, J. Tate, R. Newman, A. Hussain, J. Ionides, K. Henrick and S. Velankar).
10. Classification of Protein Function (A.M. Lesk, H. Parkinson and J.C. Whisstock).
III: DATABASE DESIGN AND INTEGRATION.
11. Information Flow and Data Integration of Databanks (C.H. Wu and W.C. Barker).
12. Models of Database Interconnectivity (G.J.L.Kemp).
13. The European Bioinformatics Institute Macromolecular Structure Relational Database Technology (H. Boutselakis, D. Dimitriopoulos, K. Henrick, J. Ionides, M. John, P.A. Keller, P. McNeil, J. Pineda and A. Suarez–Uruena).
IV: CONCLUSIONS AND PROSPECTS.
14. Looking Around, Looking Ahead (A.M. Lesk).