LSD logo

Life Sciences Division
1997-98
Progress Report

Contents

Foreword

Division

  • Overview
  • Mission
  • Structure

Systems Biology

Technology Applications

Infrastructure

Partnerships

Initiatives

Appendices

LSD Home Page

Functional Genomics and Proteomics

Recently announced plans for accelerating the sequencing of the human genome call for a first draft of the reference human genome to be completed within the next year. This will provide the biomedical research community with a computerized catalog of the names, locations, and nucleotide sequences of the 80,000 to 100,000 genes on the human chromosomes. Given the rate at which sequence data are being produced, there is the potential for discovering hundreds of new human genes every day. In addition, intensive efforts are under way to sequence the genomes of important pathogenic and environmentally and commercially significant microorganisms. Increasing focus is also being placed on sequencing plant genomes, with obvious implications for agricultural crops. Several of these smaller genomes have been completely sequenced.

As these various genomes are completed, significant advances in the ability to determine the function of genes, within and across genomes, will be required to unlock the information contained in the output from sequencing and gene searches. Biologists have been studying gene function for many years, but most of their research has been slow, costly, and directed at single genes. Access to the powerful reagents from the genome program is changing this situation. In the new era of biomedical research that has just begun, it will be possible to perform experiments in functional genomics--that is, to determine the function of genes and systems of genes on a genome-wide scale.

Gene function is determined (1) by analyzing the effects of DNA mutations in genes on normal development and health in the whole organism; (2) by analyzing a variety of signals encoded in the DNA sequence; and (3) by studying the proteins produced by a gene or system of related genes. Researchers are able to study functional genomics in humans by using genome information from other model organisms that provide rich scenarios for experimental research. The mouse, with its genetic and physiological similarities to the human and its extensive comparative genetic linkage map, is one of the leading model organisms for determining human gene function. A wide variety of genetic and molecular manipulations are possible in the mouse, making it a powerful research organism for studies of functional genomics. In addition, the availability of completed DNA sequences for plants and microbes opens opportunities to work on gene networks and gene interactions in systems where all the genes are known. Work on other model organisms also opens related research areas that are important to DOE, such as the identification of organisms in the environment and the genetic manipulation of organisms to help mitigate environmental problems.

The availability of complete DNA sequences for many organisms in the near future will enable whole new lines of scientific inquiry into the nature of the proteome, the proteins encoded by the genome. Thus, an important aspect of determining gene function is the characterization of the vast number of proteins expressed by the genome, including establishing both the structure of a particular protein and its role organism. Proteomics research programs are being planned by the DOE and other agencies. Efforts are under way to perform high-throughput assays to determine the structures of proteins and study protein complexes using x-ray crystallography, NMR spectroscopy, mass spectrometry, neutron scattering techniques, and computational tools.

At ORNL, we have initiated a program that combines unique strengths of existing research programs in mouse mutagenesis with analytical technologies and computational biology/bioinformatics to address critical issues in Functional Genomics and Proteomics. Our approach is based on the conviction that genetics and protein studies should be viewed as integral components of an overall strategy to understand protein function in the context of the whole organism and to define this function at the molecular level. The ORNL effort builds on synergism between the existing Mouse Genetics and Mutagenesis Core effort, the Computational Biology/Bioinformatics Program, and the proposed Center for Structural Molecular Biology. This approach will maximize our ability (1) to assign both biochemical and organismal function to genes and proteins, (2) to define interacting protein pathways at the molecular level, and (3) to establish the role of proteins in the whole organism. The investment of Laboratory Directed R&D (LDRD) funds in this program from FY 1997 through FY 1999 has served to catalyze cross-disciplinary research projects involving biologists, chemists, engineers, physicists, and computer scientists, to develop tools required for mounting a highly efficient, comprehensive system for functional genomics and proteomics. This team brings new approaches to bear on the scientific issues of enriching raw DNA sequence generated at the DOE Joint Genome Institute (JGI) and elsewhere, with functional information derived from gene expression and protein studies of mutations in the mouse. Specifically, we are combining advanced methods in mouse mutagenesis with the development of new functional genomics and proteomics technologies, both analytical and computational, to dissect and understand how gene expression impacts specific biological systems. This program also applies these new technologies to study important bacterial and plant systems.

A key component of our Functional Genomics and Proteomics Program is the MGRF, a DOE national user facility, which is an unparalleled resource for functional genomics. The MGRF represents one of the largest facilities in the world for carrying out experimental research in functional genomics using the mouse as a model organism. Mouse geneticists can "target" a specific gene to eliminate or alter its function in the whole animal or only in a specific cell population, or they can add normal genes back to a mutant mouse to correct an abnormality. They can engineer rearrangements in large regions of the genome and then create gene-by-gene mutations in these regions using the chemical mutagen ethylnitrosourea (ENU) to make single-base changes in DNA. ENU is useful for making multiple different mutant forms of a single gene, thereby providing more exact human disease models that mimic the subtle genetic variations characteristic of human populations. These strategies for creating mutations in mice can easily be expanded to a genome-wide scale, generating genetic reagents essential for the entire research community.

Outlined below are examples of how we are melding our advanced technologies and computational biology capabilities with our mouse mutation resource for the purpose of establishing highly efficient techniques for defining specific functions of proteins in mutation models.

  1. Gene expression and protein analysis techniques for augmenting primary phenotype screening and secondary/tertiary analyses of mutant phenotypes. Phenotype screening is both a huge undertaking and a commitment that is well suited for a national laboratory resource. The Mouse Genetics and Mutagenesis Core effort is focused on recovering phenotype-defined mutations in selected regions of the genome. Using chemically induced mutations (e.g., ENU), a sufficient number of potentially mutant animals (and carrier siblings) will be produced, allowing us to identify mutations affecting a wide variety of biological pathways. These mutants will be identified with our in-house phenotype-screening protocols (the "Screenotype" system). Additional screens will be provided by distribution of these animals to our collaborators in the Tennessee Mouse Genome Consortium, which at present includes Vanderbilt University Medical Center, Meharry Medical College, St. Jude Children's Research Hospital, and the University of Tennessee (Knoxville and Memphis). As part of this effort, we will incorporate automated, high throughput phenotype screening techniques to expand our phenotype-screening net to identify subtle physiological phenotypes. For example, we have developed a micro CAT (computerized aided tomography) device for rapidly screening mice for organ and skeletal changes. We have also developed microfluidic devices ("lab-on-a-chip") for a number of analytical processes employed in the molecular biology laboratory, such as sample purification, PCR amplification reactions, electrophoretic separations, and others. We are developing mass spectrometry-based techniques for monitoring targeted proteins, such as expression of cytokines as sentinels for mutations leading to chronic inflammatory disorders. These new technologies, as well as others being developed at ORNL and in other laboratories, represent a vital component in our strategy to meet the requirements of a large-scale proteomics effort.
  2. Technologies that facilitate genetic linkage analyses in large-scale mouse crosses for genetic dissection of complex biological pathways. Currently, we have the capability to induce and discover modifiable mutations, as well as their modifiers; however, to conduct whole-genome screening for linkage using simple-sequence repeat (SSR) markers to identify the involved genes and proteins requires new, efficient analytical tools. We will apply our established methods in mass spectrometry (based on matrix-assisted laser desorption techniques) to accomplish rapid, gel-less analyses of SSR PCR products for high-throughput linkage analyses required in studies of modifier mutations and of meiotic recombination. Other complementary technologies are also being developed, including novel hybridization chips and "lab-on-a-chip" microfluidic devices for combined PCR/electrophoresis/detection of DNA fragments.
  3. In vitro and whole-animal approaches for complex pathway analysis. In a current LDRD project, we are focusing on how to best approach the global analysis of a complex system, which promotes organ-directed, whole-organism phenotyping of genes/proteins/mutations. We are investigating a new means for screening for recessive mutations in vitro in embryonic stem (ES) cells that will allow us to focus on genes affecting complex pathways and which are likely to affect phenotypes of target organs. As an initial test system, we are examining skin differentiation in ES cells. Skin has been selected as a target organ because it is involved in developmental, cancer, aging, and exposure biology, and ES cells can be induced to differentiate into skin in vitro. As part of this endeavor, we will incorporate chip-based mRNA expression profiling technologies (which can screen for expression of genes involved in skin development, as well as other cellular processes, i.e., cell cycle, apoptosis, and others to define skin pathway perturbations induced by both existing and newly induced mutations. Technologies developed as part of this work will be readily applied to other test systems.
  4. New techniques for high-throughput detection of variants of specific proteins that can then be funneled into both genetic (organismal) and structural (biophysical) analyses. We will develop high-throughput mass spectrometry-based procedures to detect, in one-generation chemical mutagenesis screens, partially functional variants of proteins. The mutant protein identified from this "Protein Variant Screen (PVS)" would then direct the matings of variant mice to reveal both dominant and recessive whole-organism, functional phenotype(s). Such new technologies should have the sensitivity to detect heritable mutations in protein processing, as well as abnormalities in protein primary and secondary structure. Such a simple, one-generation test could easily be piggybacked onto ongoing ethylnitrosourea experiments with little additional mouse costs, in contrast to the rather substantial costs that might be incurred if one were to modify protein structure blindly, one residue at a time, by modifying loci in vitro in ES cells. Furthermore, we would specifically assay for protein polymorphisms, not knockouts; indeed, the knock-out mutation leaves the structural biologist no material to study. As one example, the MLH1 protein, involved in DNA repair, is required for recombination and the successful completion of meiosis. Male animals homozygous for knockout mutations in this gene are sterile because meiosis is arrested; thus, an allelic series of mutations would be highly desirable to ascertain whether variants (not nulls) of this protein have an effect on sterility, nondisjunction, DNA repair, and mutation rate. The developed technologies would be broadly applicable to other biological issues. For instance, it may be possible to induce a series of slightly malformed estrogen receptors in whole animals that could then be tested for interaction with xenobiotic estrogens in vitro while at the same time tested for cancer susceptibility in the whole animal. We feel that the uses for this particular marriage between classical germline chemical mutagenesis and mass spectrometry are far-ranging.
  5. Structural characterization of mutant proteins with phenotypic ramifications and comparison with wild type for determination of structure/function relationship. Using techniques of structural biology, we will analyze protein variants from a phenotypically characterized allelic series of chemically induced mutations. An important issue in functional genomics/proteomics is establishing the structure and effects of the many types of post-translational modifications that can occur in expressed proteins. State-of-the-art structural techniques will be used to characterize modified proteins at the molecular level and identify differences in their interactions with other key biomolecules. We will employ techniques such as existing resources at ORNL in mass spectrometry and the proposed CSMB facility for small angle neutron scattering, as well as structural techniques available at our collaborating national laboratories, including synchrotron-based protein crystallography and nuclear magnetic resonance. We will also draw upon ORNL's expertise in computational biology to help establish protein structure via threading and other techniques.
  6. Validating predictions of protein structure and function through mouse genetics and mutagenesis. Computational techniques in genomics and molecular biology generate predictions of gene and protein structure and function that may, in turn, generate hypotheses that can be tested both in vitro and in vivo. For example, bioinformatics techniques might suggest a function for a newly identified protein by assigning it to a family of proteins that share similar motifs and functional domains. Starting from this prediction, one would like to understand the function(s) of this protein both at the level of molecular mechanisms and its role(s) in the organism. As more and more genomic DNA sequence becomes available from the JGI and other sequencing centers, mouse genetics and mutagenesis, combined with computational analysis, will become a powerful tool for understanding the bases of gene and protein function. There are a number of levels where this can occur. In regions being targeted for mutagenesis, computational analysis of the DNA sequence of the region will suggest possible correlations between genes identified in the region and the phenotypes observed. Computational analysis can also help in identifying candidate genes in regions with known mutations by making structural and functional predictions. These predictions can then be tested by targeted mutagenesis methods. Another role for computational sequence analysis will be to determine the gene content of regions covered by "draft" sequencing to aid in the selections of regions as targets for various mutagenesis approaches. The results of mouse mutagenesis will also provide useful feedback for refining the techniques and approaches used in computational biology to predict gene and protein structure and function.

In summary, our goal is to understand protein function in the context of the whole animal and to define this function at the molecular level by combining advanced methods in mouse mutagenesis with the development of new concepts for functional genomics and proteomics technologies. Although the focus of this effort is not related to the field of drug discovery, the developed tools would certainly be applicable to this area as well. We will work with other national laboratories, in particular, to complement their related Functional Genomics/ Proteomics programs, making ORNL resources and technologies available to them. We will also form new collaborations and strengthen existing collaborations with and serve as a resource to laboratories at other national laboratories and in academia and industry. An April 1998 conference, "Partnering for Functional Genomics Research," attracted representatives from 14 pharmaceutical and biotechnology companies. All participating companies expressed interest in further interactions, and several new projects are being discussed. The Merck Genome Research Institute initiated a research project through the Joint Institute for Biological Sciences. Other follow-on activities are under way to establish collaborative efforts and to pursue the development of an R&D consortium involving several industry partners. As with all actively growing research programs, new directions become apparent which require change; however the general models outlined above give an overview of the type of program we are building and the types of important and relevant information that will be obtained.