From DNA to RNA to protein – deciphering the information encoded by the genome

The flood of genome sequences from different organisms has revealed that organisms ranging from simple to complex do not differ greatly in their number of genes – the simple sea urchin has 23,000 genes, the same number as humans. It is now clear that much of the complexity of animals is created by extensive gene regulatory mechanisms that impact gene expression, isoform selection, and other post-transcriptional regulatory steps.  Mapping these mechanisms remains one of the next frontiers of biology.  An international team led by Tim Hughes at the Donnelly Centre has addressed this challenge, and determined the DNA sequence binding preferences of more than a thousand transcription factors.  These proteins bind DNA to regulate gene expression, and Hughes’ study provides the largest profile to date of these key proteins, enabling prediction of the binding specificities of over one third of all such regulators. Their work has broad implications for challenges as diverse as improving agriculture and interpreting mutations in human DNA.

Transcribing a messenger RNA (mRNA) is only the first step in gene expression – in humans and other eukaryotes, particular regions of a gene are selected for retention in the mRNA by a process called splicing. The Blencowe laboratory has developed and applied new experimental and computational approaches that have uncovered extraordinary regulatory and functional complexity at the level of alternative splicing in mammalian cells and tissues. These studies have enabled the inference of a predictive code for regulated alternative splicing, as well as the discovery and characterization of splicing regulatory networks with pivotal roles in the control of embryonic stem cell pluripotency and neurogenesis. Additional research in Blencowe’s group is focusing the identification of key trans-acting factors and mechanisms that control splicing regulatory networks in normal development and human disease contexts. His group has also recently embarked on a project directed at the discovery of new roles for non-coding RNAs in the control of RNA processing.

Once a gene is transcribed into an mRNA, a host of mechanisms control whether, where, and when that mRNA is translated into a protein. This translational control is particularly important during embryonic development; mRNAs from the mother kick-start the initial steps of development post fertilization, and RNA binding proteins are used to pattern the embryo and control the stages of development.  Howard Lipshitz led a team that studied an RNA binding protein called Staufen using fruit flies and human cells.  They found that the RNA structures bound by Staufen were similar from flies to humans, allowing them to better predict which genes Staufen can regulate. Given their success with this approach, the research team is now working to understand the roles of the hundreds of other RNA binding proteins encoded in the genome. This will provide insight as to which RNAs are likely to be regulated by specific proteins, revealing important regulatory relationships in the cell.

A particularly interesting group of mRNAs are those encoding the nuclear hormone receptors, which regulate development and sexual differentiation in organisms from fruit flies to humans.  Henry Krause’s group recently completed a study examining the localization patterns of all nuclear hormone receptor mRNAs encoded by the fruit fly genome during the course of development.  This comprehensive work revealed that nearly all of these mRNAs have specific locations within cells, suggesting that there are RNA binding proteins that control this localization.

In addition to localization of mRNAs, the position of proteins within the cell is critical for their function. Brenda Andrews has been a pioneer in the large-scale analysis of intracellular protein localization.  Her group takes advantage of one of the wonders of modern molecular genetics  –  the potential to repurpose a gene from one organism to another.   Fluorescent and luminescent proteins are now widely used tools in biology, allowing researchers to tag a protein of interest and monitor its localization and abundance over time under a microscope.  Her group uses high-throughput microscopy to trace the locations of proteins within cells, providing insight into how proteins are relocalized for specific tasks such as controlling cell division.

One of the major challenges for a cell is generating the energy required for maintenance and growth. Cells must take up nutrients in order to survive and grow, a process that requires thousands of distinct chemical reactions to convert nutrients into energy and necessary building blocks. Most of these chemical reactions are too slow to support life. Proteins capable of speeding up these reactions, called enzymes, evolved to permit life as we know it.  Recent advances in mass spectrometry allow the simultaneous measurement of thousands of the molecules of life (including chemicals and proteins), in an approach called metabolomics.  Amy Caudy’s group is using these methods to identify previously unknown enzymes that catalyze cellular reactions.  One recent discovery was of an enzyme present only in fungi and a few bacteria that provides a new route for synthesis of ribose, the building block of RNA and DNA.  Her group is now working to understand how this enzyme allows invading fungi to carry out RNA and DNA synthesis when oxygen is limiting.


Identifying the interactions among genes and proteins

Genes do not operate in isolation, but instead work together, interacting to collaboratively build life as we know it.  Charlie Boone developed a technology to identify the interactions between genes in the model yeast, Saccharomyces cerevisiae, and in recent years he has used robotics to enable his group to study the interactions among thousands of genes. This work has revealed that genes with similar functions (for example, DNA replication) tend to interact. This observation holds true for many types of genes, enabling assignment of function to many uncharacterized genes. Insights from genetic interaction maps generated by the Boone lab in collaboration with the Andrews lab, has guided discoveries of diverse labs working in organisms from mice to humans.  As the yeast genetic networks approach genome-scale, this work provides the most comprehensive view of cellular function to date.

Andrew Emili’s laboratory has recently adapted these same methods to investigate genetic interactions in bacteria.  The group developed an approach to screen genetic interactions in the model bacterium Escherichia coli, and compared these with the extensive information on the protein-protein interactions and gene expression patterns in this well-studied bacterium. They found that key principles of genetic interactions from yeast hold true in bacteria, such as the observation that it is more deleterious to inhibit two different cellular functions than two genes with the same function. This work predicted the function of many previously uncharacterized bacterial genes, and will facilitate molecular dissection of bacterial growth programs. 

As with genetic interactions, few proteins act in isolation.  The interactions between different proteins allow for events such as the transmission of signals within cells or the formation of complex structures.  Philip Kim’s group combines computational and experimental approaches to studying protein-protein interactions.  Recently, they used a high throughput approach called phage display that employs computational design, oligonucleotide synthesis, and high-throughput screening to identify new proteins that interact with several important signaling proteins.  The interactions identified were supported by more traditional approaches.   With this proof of principle of their new technology, the group is now using this combination of techniques to address various problems such as drug discovery and protein design. Anne-Claude Gingras’ group applies state-of-the-art proteomic methods to understand how signals are relayed within cells to control growth and proliferation.

Proteins localized to the cell membrane are particularly challenging to study, as they require the fatty cell membrane for their proper folding and function.  Igor Stagljar has adapted a method originally applied in yeast cells for use in mammalian cells.  His group has used this new method to discover hundreds of proteins that interact with important cell surface proteins such as the epidermal growth factor receptor that is mutated in non-small cell lung carcinoma. 

After genes are transcribed and then translated into proteins, the activity and levels of these proteins is regulated.  This post-translational regulation of proteins allows cells to respond rapidly to changing conditions, and to remove or inactivate proteins when necessary. Molecular Genetics is home to a number of leading labs that use mass spectrometry to identify proteins, quantify their levels, and measure post-translational modifications on proteins.  Mike Moran’s laboratory has used such proteomic approaches to delineate signaling networks mediated by post-translational modifications, including phosphorylation and ubiquitination, and recently integrated multiple datasets, spanning the continuum from DNA to RNA to proteome. This work has uncovered molecular signatures in lung tumors with prognostic impact.

One of the paradoxes of biology is how the same protein is often used to signal different events at different times and places during development. The transforming growth factor beta pathway can maintain the pluripotent potential of human stem cells, but it also is required during development to specify the fate of tissues such as muscle and liver. Jeff Wrana’s group has used proteomic methods to identify DNA binding proteins specific to stem cells, and then followed with genome sequencing methods to identify the binding sites of these stem cell proteins.  They have observed that these proteins are replaced by other DNA binding factors in order to trigger cell differentiation. 


Identifying disease genes

Faithfully replicating DNA is the most important task of a dividing cell.  Many proteins work together to ensure that genetic information is faithfully transmitted to daughter cells.  Mutation of genes that encode regulators of DNA replication fidelity can lead to cancer, as exemplified by the increased risk of breast cancer for women – and men –  who inherit mutations in the DNA repair protein BRCA-1. The Durocher lab is harnessing powerful functional genomics tools such as RNA interference screening coupled to high throughput imaging to discover new factors that promote genome integrity in human cells. This powerful approach led to the discovery of RNF168, a gene mutated in a rare genetic disorder called RIDDLE syndrome. The Durocher lab is currently implementing screens using CRISPR/Cas9-based genome editing, a new method for manipulating genomes and studying gene function.

Once DNA is replicated, it must be segregated into daughter cells. Laurence Pelletier applies similar screening methods to identify proteins required for the pairwise separation of chromosomes into daughter cells. His group also contributes their expertise in time-lapse imaging to the work of many other laboratories in the Toronto area.

Less than 1% of the human genome codes for proteins, and the remaining 99% of the genome, the so-called non-coding DNA, has been challenging to understand.  Among this non-coding DNA are signals for gene regulation. Michael Wilson’s comparative genomics lab uses experimental and computational approaches to understand how genes are controlled by noncoding regions. The Wilson team  used next generation sequencing to discover that liver-specific transcription factor binding sites shared among humans, macaque monkeys, mice, rats, and dogs regulate critical tissue-specific pathways and coincide with disease-causing human regulatory DNA mutations. This work demonstrates the value of comparative genome analysis to identify crucial regulatory DNA sequences. The Wilson lab will expand upon these new findings to determine how cardiovascular disease genes are controlled.

Sick Kids hospital is among the leading centers in the world using DNA sequencing to assist in disease diagnosis. Large-scale DNA sequencing methods can now determine the sequence of many genes or even the entire genome from a patient in just a few days. This comprehensive information allows clinicians to find the cause of even previously unknown genetic diseases. In the course of pinpointing the cause of a child’s illness, these whole-genome methods can also reveal genetic risks for later onset diseases. These incidental findings of disease risk in the patient also affect the patient’s family because of their shared genetic material. Steve Meyn is leading the ongoing debate over how to appropriately counsel his patients and their parents in the face of unprecedented genetic information.


Uncovering the genes responsible for the growth of cancer cells  

Cancer researchers around the world are sequencing the genomes of cancer cells to identify the changes that permit uncontrolled cell growth.  Sean Egan’s group takes advantage of publicly available breast cancer genomics data to design and study new mouse models for specific breast cancer subtypes.  In turn, these mice are used in transposon-based screens to identify genetic events that promote progression of primary tumours, as well as metastatic dissemination.  In this way, tumor-specific signalling networks can be defined, with an eye towards targeted therapy.

One of the most prevalent childhood brain cancers is ependymoma, which is not treatable by current chemotherapy. Gary Bader’s lab collaborated with Michael Taylor at Sick Kids to computationally analyze cancer genomics data from ependymoma tumours.  This work identified histone and DNA methylation by PRC2 (Polycomb Repressive Complex 2) as the first rational therapeutic target for this disease. This DNA modification pathway is targetable by available drugs. Following promising experiments in primary cell lines and mouse models, these drugs have been employed for compassionate use in a patient. A clinical trial is planned to more generally assess this novel treatment.

As tumors grow, genetic mutations arise. Many of these mutations have no effect, and a few “driver” mutations allow the tumor to grow faster and to invade the body. When a tumor sample is prepared for DNA sequencing, a mixture of cells are present.  Quaid Morris and colleagues have developed a method for analyzing DNA sequences to reconstruct the mutational history of a tumor and identify those mutations that drive cancer growth.

The Ontario Institute for Cancer Research is a major center for cancer treatment and research. As director, Tom Hudson oversees a large research portfolio while continuing to pursue his own research in uncovering the genetic variants that predispose some people to colorectal cancer.


Investigating infectious disease at genome scale

Legionnaire’s disease was discovered in 1976 (the same year as the discovery of Ebola), when it sickened and killed American Legion members at a convention in Philadelphia.  This form of pneumonia is caused by a group of bacteria called Legionella that normally live in aquatic environments. Over the last 15 years, the clinical prevalence of Legionella pneumophila sequence type 222 (st222) has mysteriously increased relative to other strains, culminating in a deadly Toronto outbreak of Legionnaires' disease in 2005 that killed 23 people. In collaboration with Public Health Ontario and Genome Quebec, Alex Ensminger is using two modern whole genome sequencing technologies – Illumina deep-sequencing and Pacific Biosciences single-molecule sequencing – to decipher the genome of an emerging public health threat in Ontario. By comparing the genomes of different strains and testing the differing genes in laboratory experiments, the Ensminger lab is identifying specific genetic determinants that may have allowed ST222 to steadily outcompete other Legionella strains within the province.

The bacterium E. coli is one of the major gut denizens of humans, an occasional pathogen, and a workhorse model organism of the laboratory.  William Navarre has discovered that EF-P, a protein associated with the translation machinery, is necessary for E. coli to respond to stress. Strikingly, EF-P has been conserved throughout evolution all the way to humans. While the Navarre group had previously used genetic and biochemical tools to identify a few proteins that require EF-P for proper expression, the full list of proteins requiring EF-P were not known.  Using a quantitative proteomic method, the group compared protein levels between cells with normal levels of EF-P and cells with reduced EF-P activity. The proteins affected by EF-P contain kinked protein structures following difficult-to-translate sequences.  This fundamental research revealed a previously unknown aspect of translational control by EF-P, and illuminates how translation rate depends on specific protein sequences.

Together with the Boone and Andrews labs, the Cowen lab has pioneered functional genomic approaches to dissect circuitry governing a reversible developmental transition between yeast and filamentous growth, a trait that enables the model yeast S. cerevisiae to forage for nutrients and the opportunistic pathogen Candida albicans to invade human tissues and evade the host immune system.  They have leveraged genome-scale mutant collections of C. albicans recently made available from Merck, and have utilized chemical genomic screening to identify the proteins that enable fungal pathogens to evade antifungal drugs.  By identifying genes required for drug resistance and virulence, the Cowen lab is uncovering much-needed potential targets for the development of new antifungal drugs.


Building world-renowned scientific resources

Antibodies are our bodies’ continually evolving defense against the viruses and bacteria that attack us.  The Sidhu lab replicates this amazingly sensitive and specific system to design custom antibodies for a range of targets. Recently, they developed antibodies targeting a strain of Ebola virus that were able to protect mice from infection. The group continues to develop a range of antibody reagents; some reagents support research questions, such as understanding the sites of DNA binding protein localization, while others support therapeutic applications, including inhibiting cancer cell proliferation by blocking growth signals. 

If some people call the quest to find new drug targets a fishing expedition, Aled Edwards is a scientist who sets out to build a bigger boat so he can bring more fishermen. The Toronto Structural Genomics Consortium is a public-private consortium sited at the University of Toronto and at Oxford University that is the most prolific producer of protein crystal structures in the world.  These 3-D models of proteins, accurate to atomic scale, are used by pharmaceutical companies to design new drugs. Recently, Edwards’ group appreciated the need for better tools to study proteins called chromatin modifiers, which make semipermanent modifications to the genome that set patterns of gene expression in cells.  They are producing a set of reagents freely available to academia and industry to facilitate study of these important regulators.  

Jason Moffat’s team has spearheaded the development of resources for functional genomics in mammalian cells.  His group uses RNA interference by short hairpin RNAs, CRISPR/CAS9 genome editing, and other methods to manipulate gene expression in mammalian cells. These approaches enable researchers to understand gene function and explore processes from development to disease pathogenesis to cancer.  Recently, Moffat’s group pooled data from over 70 whole genome screens of gene function to identify genes essential for growth of mammalian cells.  This provides a powerful resource for comparative analysis of different data sets, and highlights the core gene set important for cell proliferation.

The deluge of data from current biological research is challenging for researchers to manage and understand. Lincoln Stein leads multiple database efforts to gather and make accessible the fruits of modern genome-scale research so that researchers can compare their results with those of others.  These databases and applications support the efforts of thousands of researchers working to understand a range of organisms, from plants to fruit flies to humans.  The group recently developed a plugin application for analysis of functional interactions called Reactome FI Viz that allows researchers to view the functional roles of genes of interest.  For example, a user can upload a list of genes mutated in a set of tumor samples and quickly learn the relationships between these genes.

Clearly, the fields of functional genomics and proteomics are reaching their potential to provide answers for the many mysteries of disease and development. These researchers in the Department of Molecular Genetics continue to pioneer new methods and applications of technologies as a world leader in these exciting fields.