|

Life Sciences Division
1997-98
Progress Report
Contents
Foreword
Division
- Overview
- Mission
- Structure
Systems Biology
Technology Applications
Infrastructure
Partnerships
Initiatives
Appendices
LSD Home Page
|
Functional Genomics and Proteomics
Recently announced plans for accelerating the sequencing of the human
genome call for a first draft of the reference human genome to be completed
within the next year. This will provide the biomedical research community
with a computerized catalog of the names, locations, and nucleotide sequences
of the 80,000 to 100,000 genes on the human chromosomes. Given the rate
at which sequence data are being produced, there is the potential for
discovering hundreds of new human genes every day. In addition, intensive
efforts are under way to sequence the genomes of important pathogenic
and environmentally and commercially significant microorganisms. Increasing
focus is also being placed on sequencing plant genomes, with obvious implications
for agricultural crops. Several of these smaller genomes have been completely
sequenced.
As these various genomes are completed, significant advances in the ability
to determine the function of genes, within and across genomes, will be
required to unlock the information contained in the output from sequencing
and gene searches. Biologists have been studying gene function for many
years, but most of their research has been slow, costly, and directed
at single genes. Access to the powerful reagents from the genome program
is changing this situation. In the new era of biomedical research that
has just begun, it will be possible to perform experiments in functional
genomics--that is, to determine the function of genes and systems of genes
on a genome-wide scale.
Gene function is determined (1) by analyzing the effects of DNA mutations
in genes on normal development and health in the whole organism; (2) by
analyzing a variety of signals encoded in the DNA sequence; and (3) by
studying the proteins produced by a gene or system of related genes. Researchers
are able to study functional genomics in humans by using genome information
from other model organisms that provide rich scenarios for experimental
research. The mouse, with its genetic and physiological similarities to
the human and its extensive comparative genetic linkage map, is one of
the leading model organisms for determining human gene function. A wide
variety of genetic and molecular manipulations are possible in the mouse,
making it a powerful research organism for studies of functional genomics.
In addition, the availability of completed DNA sequences for plants and
microbes opens opportunities to work on gene networks and gene interactions
in systems where all the genes are known. Work on other model organisms
also opens related research areas that are important to DOE, such as the
identification of organisms in the environment and the genetic manipulation
of organisms to help mitigate environmental problems.
The availability of complete DNA sequences for many organisms in the
near future will enable whole new lines of scientific inquiry into the
nature of the proteome, the proteins encoded by the genome. Thus, an important
aspect of determining gene function is the characterization of the vast
number of proteins expressed by the genome, including establishing both
the structure of a particular protein and its role organism. Proteomics
research programs are being planned by the DOE and other agencies. Efforts
are under way to perform high-throughput assays to determine the structures
of proteins and study protein complexes using x-ray crystallography, NMR
spectroscopy, mass spectrometry, neutron scattering techniques, and computational
tools.
At ORNL, we have initiated a program that combines unique strengths of
existing research programs in mouse mutagenesis with analytical technologies
and computational biology/bioinformatics to address critical issues in
Functional Genomics and Proteomics. Our approach is based on the conviction
that genetics and protein studies should be viewed as integral components
of an overall strategy to understand protein function in the context of
the whole organism and to define this function at the molecular level.
The ORNL effort builds on synergism between the existing Mouse Genetics
and Mutagenesis Core effort, the Computational Biology/Bioinformatics
Program, and the proposed Center for Structural Molecular Biology. This
approach will maximize our ability (1) to assign both biochemical and
organismal function to genes and proteins, (2) to define interacting protein
pathways at the molecular level, and (3) to establish the role of proteins
in the whole organism. The investment of Laboratory Directed R&D (LDRD)
funds in this program from FY 1997 through FY 1999 has served to catalyze
cross-disciplinary research projects involving biologists, chemists, engineers,
physicists, and computer scientists, to develop tools required for mounting
a highly efficient, comprehensive system for functional genomics and proteomics.
This team brings new approaches to bear on the scientific issues of enriching
raw DNA sequence generated at the DOE Joint Genome Institute (JGI) and
elsewhere, with functional information derived from gene expression and
protein studies of mutations in the mouse. Specifically, we are combining
advanced methods in mouse mutagenesis with the development of new functional
genomics and proteomics technologies, both analytical and computational,
to dissect and understand how gene expression impacts specific biological
systems. This program also applies these new technologies to study important
bacterial and plant systems.
A key component of our Functional Genomics and Proteomics Program is
the MGRF, a DOE national user facility, which is an unparalleled resource
for functional genomics. The MGRF represents one of the largest facilities
in the world for carrying out experimental research in functional genomics
using the mouse as a model organism. Mouse geneticists can "target" a
specific gene to eliminate or alter its function in the whole animal or
only in a specific cell population, or they can add normal genes back
to a mutant mouse to correct an abnormality. They can engineer rearrangements
in large regions of the genome and then create gene-by-gene mutations
in these regions using the chemical mutagen ethylnitrosourea (ENU) to
make single-base changes in DNA. ENU is useful for making multiple different
mutant forms of a single gene, thereby providing more exact human disease
models that mimic the subtle genetic variations characteristic of human
populations. These strategies for creating mutations in mice can easily
be expanded to a genome-wide scale, generating genetic reagents essential
for the entire research community.
Outlined below are examples of how we are melding our advanced technologies
and computational biology capabilities with our mouse mutation resource
for the purpose of establishing highly efficient techniques for defining
specific functions of proteins in mutation models.
- Gene expression and protein analysis techniques for augmenting
primary phenotype screening and secondary/tertiary analyses of mutant
phenotypes. Phenotype screening is both a huge undertaking and a
commitment that is well suited for a national laboratory resource. The
Mouse Genetics and Mutagenesis Core effort is focused on recovering
phenotype-defined mutations in selected regions of the genome. Using
chemically induced mutations (e.g., ENU), a sufficient number of potentially
mutant animals (and carrier siblings) will be produced, allowing us
to identify mutations affecting a wide variety of biological pathways.
These mutants will be identified with our in-house phenotype-screening
protocols (the "Screenotype" system). Additional screens will be provided
by distribution of these animals to our collaborators in the Tennessee
Mouse Genome Consortium, which at present includes Vanderbilt University
Medical Center, Meharry Medical College, St. Jude Children's Research
Hospital, and the University of Tennessee (Knoxville and Memphis). As
part of this effort, we will incorporate automated, high throughput
phenotype screening techniques to expand our phenotype-screening net
to identify subtle physiological phenotypes. For example, we have developed
a micro CAT (computerized aided tomography) device for rapidly screening
mice for organ and skeletal changes. We have also developed microfluidic
devices ("lab-on-a-chip") for a number of analytical processes employed
in the molecular biology laboratory, such as sample purification, PCR
amplification reactions, electrophoretic separations, and others. We
are developing mass spectrometry-based techniques for monitoring targeted
proteins, such as expression of cytokines as sentinels for mutations
leading to chronic inflammatory disorders. These new technologies, as
well as others being developed at ORNL and in other laboratories, represent
a vital component in our strategy to meet the requirements of a large-scale
proteomics effort.
- Technologies that facilitate genetic linkage analyses in large-scale
mouse crosses for genetic dissection of complex biological pathways.
Currently, we have the capability to induce and discover modifiable
mutations, as well as their modifiers; however, to conduct whole-genome
screening for linkage using simple-sequence repeat (SSR) markers to
identify the involved genes and proteins requires new, efficient analytical
tools. We will apply our established methods in mass spectrometry (based
on matrix-assisted laser desorption techniques) to accomplish rapid,
gel-less analyses of SSR PCR products for high-throughput linkage analyses
required in studies of modifier mutations and of meiotic recombination.
Other complementary technologies are also being developed, including
novel hybridization chips and "lab-on-a-chip" microfluidic devices for
combined PCR/electrophoresis/detection of DNA fragments.
- In vitro and whole-animal approaches for complex pathway
analysis. In a current LDRD project, we are focusing on how to best
approach the global analysis of a complex system, which promotes organ-directed,
whole-organism phenotyping of genes/proteins/mutations. We are investigating
a new means for screening for recessive mutations in vitro in
embryonic stem (ES) cells that will allow us to focus on genes affecting
complex pathways and which are likely to affect phenotypes of target
organs. As an initial test system, we are examining skin differentiation
in ES cells. Skin has been selected as a target organ because it is
involved in developmental, cancer, aging, and exposure biology, and
ES cells can be induced to differentiate into skin in vitro.
As part of this endeavor, we will incorporate chip-based mRNA expression
profiling technologies (which can screen for expression of genes involved
in skin development, as well as other cellular processes, i.e., cell
cycle, apoptosis, and others to define skin pathway perturbations induced
by both existing and newly induced mutations. Technologies developed
as part of this work will be readily applied to other test systems.
- New techniques for high-throughput detection of variants of specific
proteins that can then be funneled into both genetic (organismal) and
structural (biophysical) analyses. We will develop high-throughput
mass spectrometry-based procedures to detect, in one-generation chemical
mutagenesis screens, partially functional variants of proteins. The
mutant protein identified from this "Protein Variant Screen (PVS)" would
then direct the matings of variant mice to reveal both dominant and
recessive whole-organism, functional phenotype(s). Such new technologies
should have the sensitivity to detect heritable mutations in protein
processing, as well as abnormalities in protein primary and secondary
structure. Such a simple, one-generation test could easily be piggybacked
onto ongoing ethylnitrosourea experiments with little additional mouse
costs, in contrast to the rather substantial costs that might be incurred
if one were to modify protein structure blindly, one residue at a time,
by modifying loci in vitro in ES cells. Furthermore, we would
specifically assay for protein polymorphisms, not knockouts; indeed,
the knock-out mutation leaves the structural biologist no material to
study. As one example, the MLH1 protein, involved in DNA repair, is
required for recombination and the successful completion of meiosis.
Male animals homozygous for knockout mutations in this gene are sterile
because meiosis is arrested; thus, an allelic series of mutations would
be highly desirable to ascertain whether variants (not nulls) of this
protein have an effect on sterility, nondisjunction, DNA repair, and
mutation rate. The developed technologies would be broadly applicable
to other biological issues. For instance, it may be possible to induce
a series of slightly malformed estrogen receptors in whole animals that
could then be tested for interaction with xenobiotic estrogens in
vitro while at the same time tested for cancer susceptibility in
the whole animal. We feel that the uses for this particular marriage
between classical germline chemical mutagenesis and mass spectrometry
are far-ranging.
- Structural characterization of mutant proteins with phenotypic
ramifications and comparison with wild type for determination of structure/function
relationship. Using techniques of structural biology, we will analyze
protein variants from a phenotypically characterized allelic series
of chemically induced mutations. An important issue in functional genomics/proteomics
is establishing the structure and effects of the many types of post-translational
modifications that can occur in expressed proteins. State-of-the-art
structural techniques will be used to characterize modified proteins
at the molecular level and identify differences in their interactions
with other key biomolecules. We will employ techniques such as existing
resources at ORNL in mass spectrometry and the proposed CSMB facility
for small angle neutron scattering, as well as structural techniques
available at our collaborating national laboratories, including synchrotron-based
protein crystallography and nuclear magnetic resonance. We will also
draw upon ORNL's expertise in computational biology to help establish
protein structure via threading and other techniques.
- Validating predictions of protein structure and function through
mouse genetics and mutagenesis. Computational techniques in genomics
and molecular biology generate predictions of gene and protein structure
and function that may, in turn, generate hypotheses that can be tested
both in vitro and in vivo. For example, bioinformatics
techniques might suggest a function for a newly identified protein by
assigning it to a family of proteins that share similar motifs and functional
domains. Starting from this prediction, one would like to understand
the function(s) of this protein both at the level of molecular mechanisms
and its role(s) in the organism. As more and more genomic DNA sequence
becomes available from the JGI and other sequencing centers, mouse genetics
and mutagenesis, combined with computational analysis, will become a
powerful tool for understanding the bases of gene and protein function.
There are a number of levels where this can occur. In regions being
targeted for mutagenesis, computational analysis of the DNA sequence
of the region will suggest possible correlations between genes identified
in the region and the phenotypes observed. Computational analysis can
also help in identifying candidate genes in regions with known mutations
by making structural and functional predictions. These predictions can
then be tested by targeted mutagenesis methods. Another role for computational
sequence analysis will be to determine the gene content of regions covered
by "draft" sequencing to aid in the selections of regions as targets
for various mutagenesis approaches. The results of mouse mutagenesis
will also provide useful feedback for refining the techniques and approaches
used in computational biology to predict gene and protein structure
and function.
In summary, our goal is to understand protein function in the context
of the whole animal and to define this function at the molecular level
by combining advanced methods in mouse mutagenesis with the development
of new concepts for functional genomics and proteomics technologies. Although
the focus of this effort is not related to the field of drug discovery,
the developed tools would certainly be applicable to this area as well.
We will work with other national laboratories, in particular, to complement
their related Functional Genomics/ Proteomics programs, making ORNL resources
and technologies available to them. We will also form new collaborations
and strengthen existing collaborations with and serve as a resource to
laboratories at other national laboratories and in academia and industry.
An April 1998 conference, "Partnering for Functional Genomics Research,"
attracted representatives from 14 pharmaceutical and biotechnology companies.
All participating companies expressed interest in further interactions,
and several new projects are being discussed. The Merck Genome Research
Institute initiated a research project through the Joint Institute for
Biological Sciences. Other follow-on activities are under way to establish
collaborative efforts and to pursue the development of an R&D consortium
involving several industry partners. As with all actively growing research
programs, new directions become apparent which require change; however
the general models outlined above give an overview of the type of program
we are building and the types of important and relevant information that
will be obtained.
|