Data Management in Structural Genomics: an Overview
S. Haquin, E. Oeuillet, A. Pajon, M. Harris, A.T. Jones, H. van Tilbeurgh, J.L. Markley, Z. Zolnai, and A. Poupon
Methods in Molecular Biology, in press.
Data management has been identified as a crucial issue in all large scale experimental projects. In this type of project, many different persons manipulate multiple objects in different locations; thus, unless complete and accurate records are maintained, it is extremely difficult to understand exactly what has been done, when it was done, who did it, and what exact protocol was used. All of this information is essential for use in publications, for reusing successful protocols, for determining why a target has failed, and for validating and optimizing protocols. Although data management solutions have been in place for certain focused activities, for example genome sequencing and microarray experiments, they are just emerging for more widespread projects, such as structural genomics, metabolomics, and systems biology as a whole. The complexity of experimental procedures, and the diversity and high rate of development of protocols used in a single centre, or across various centres, have important consequences for the design of information management systems. Because procedures are carried out both by machines and by hand, the system must be capable of handling data entry both from robotic systems and by means of a user-friendly interface. The information management system needs to be flexible so that it can handle changes in existing protocols or newly added protocols. Because no commercial information management systems have had the needed features, most structural genomics groups have developed their own solutions. In this chapter we discuss the advantages of using a LIMS (Laboratory Information Management System), for day-to-day management of structural genomics projects, and also for data mining. We review different solutions currently in place or under development with emphasis on three systems developed by us: Xtrack, Sesame (developed at the Center for Eukaryotic Structural Genomics under the US Protein Structural Genomics Initiative), and HalX (developed at the Yeast Structural Genomics Laboratory, in collaboration with the European SPINE project).