Current and Future Directions in Genomics of Amyotrophic Lateral Sclerosis




New knowledge of the structure and function of the human genome and novel genomic technologies are being applied to the study of sporadic amyotrophic lateral sclerosis (ALS). These studies can examine tens to hundreds of thousands of items at once, and depend on sophisticated computer processing. Current studies are focused on genetic susceptibility and gene expression and future studies will likely focus on structural variation, gene regulation and non-protein coding regions. The hope is that they will lead to deeper understanding of molecular aspects of the disease and to rational therapeutic targets.


Mendelian and complex genetics of familial and sporadic amyotrophic lateral sclerosis


Advances in science in the past decade have accelerated the pace of discovery in amyotrophic lateral sclerosis (ALS), especially its genomics. This article discusses the Human Genome Project, the International HapMap, and emerging microarray and microdissection technologies, which have set the stage for several studies seeking fundamental insights into mechanisms of diseases and targets for therapy. Directly understanding the 90% to 95% of ALS that is sporadic (SALS) as opposed to the 5% to 10% that is familial (FALS) now seems possible.


FALS follows simple Mendelian autosomal dominant inheritance; 20% are caused by mutations in the gene encoding superoxide dismutase 1 (SOD1) at chromosome 21q22.1 and the remaining 80% by unknown mutations. SALS, by contrast, is widely believed to be a complex genetic disease , which means genetic factors are important, but how much genetic factors contribute and the interplay of genetic and environmental factors are unknown. This article summarizes recent developments in genomics and efforts to apply them to understanding ALS, and highlights possible future directions of genomic research.




The Human Genome Project


The Human Genome Project was completed in 2001 and was the culmination of major advances in science, technology, and diplomacy. It was an international collaboration to sequence the 3.5 billion nucleotides in the human genome ( http://www.ncbi.nlm.nih.gov/sites/entrez?db=nucleotide ).


The human genome’s nucleotide sequence can be divided into genic and intergenic regions. The genic regions are a minority of the code, coding for approximately 25,000 different genes. These genes are composed of promoter regions, which regulate its downstream gene; exons, which code for protein; and introns, which separate one exon from another. An average gene is 27,000 base pairs in length and consists of nine exons. In a complex assembly process in the nucleus, genes are transcribed into messenger RNA (mRNA), introns are removed, and exons are spliced together. The new assembly is then transported to the cytoplasm where protein is synthesized. The exact sequence in which exons are spliced together generates exon splice variants , which produce greater variations of proteins than code alone could provide, and likely plays an important role in pathogenesis of some diseases .


The intergenic regions, accounting for 99% of the human genome, once dismissed as “junk,” are now recognized as having critical regulatory functions .




The Human Genome Project


The Human Genome Project was completed in 2001 and was the culmination of major advances in science, technology, and diplomacy. It was an international collaboration to sequence the 3.5 billion nucleotides in the human genome ( http://www.ncbi.nlm.nih.gov/sites/entrez?db=nucleotide ).


The human genome’s nucleotide sequence can be divided into genic and intergenic regions. The genic regions are a minority of the code, coding for approximately 25,000 different genes. These genes are composed of promoter regions, which regulate its downstream gene; exons, which code for protein; and introns, which separate one exon from another. An average gene is 27,000 base pairs in length and consists of nine exons. In a complex assembly process in the nucleus, genes are transcribed into messenger RNA (mRNA), introns are removed, and exons are spliced together. The new assembly is then transported to the cytoplasm where protein is synthesized. The exact sequence in which exons are spliced together generates exon splice variants , which produce greater variations of proteins than code alone could provide, and likely plays an important role in pathogenesis of some diseases .


The intergenic regions, accounting for 99% of the human genome, once dismissed as “junk,” are now recognized as having critical regulatory functions .




The International HapMap Project


The International HapMap Project was the natural successor to the Human Genome Project . It was also conducted by an international collaboration that officially began in 2002. The motivating force was to provide a practical navigation map for exploring the human genome . This map is possible because interspersed throughout the genome are 11 or more million sites of predictable variation called single nucleotide polymorphisms (SNPs). Variations that appear at one SNP are often parallel or “linked” to those that appear at others, a phenomenon called linkage disequilibrium . The HapMap project catalogued approximately 3.5 million SNPs across the four main ethnic groups of the human species, namely north European Caucasians, Yorubans from Nigeria, Han Chinese, and Japanese. Exact numerical measurements of linkage disequilibrium between SNPs were included. Since SNPs with high linkage disequilibrium to each other could serve as proxies for each other, a representative selection of a few hundred thousand SNPs (called tagging SNPs ) could adequately represent SNP variation in the entire genome . These could then be used to search the genome.




Microarray technology


Simultaneous with the Human Genome and International HapMap Projects was the emergence of microarray technologies. These technologies simultaneously profile hundreds of thousands of DNA or RNA sequences through microchip and microbead technologies.


In this technology, a small nucleotide probe of known sequence is synthesized and attached using laser technology to a microchip at specific x – and y -coordinate locations. The resolution of the coordinates is in the range of 3 to 10 μm and millions of probes can be placed on one microchip, which can thus represent the entire genome systematically. A labeled biologic test sample, derived from DNA or RNA, is applied to the microchip and, if sequences in the biologic test sample are complementary to those on the microchip, hybridization occurs. Signal recognition software registers what sequences hybridized and the degree of the hybridization. A digital file is then generated that profiles the biologic test sample.


Many microarray platforms exist, depending on what is being sought. The most common platforms are (1) SNP arrays, which have probes designed to detect single nucleotide polymorphisms; (2) expression arrays, which have probes designed to detect genes; (3) exon arrays, which have probes designed to detect the exons that comprise genes; and (4) tiling arrays, which have probes that detect short sequences systematically throughout the genome.


Microarray technology has engendered a new investigational paradigm that is referred to as exploration or discovery because does not depend on candidate biology or prior hypotheses. Instead, a specific test condition, such as SALS, is defined and comprehensively profiled and significant patterns and hypotheses are sought . This paradigm generates enormous data, creating major challenges to computational biology for data mining (ie, methods for searching data for meaning) and has created a new field called bioinformatics . One major challenge is the huge numbers of false leads that are generated. For example, if one microarray platform measures 550,000 items and the statistical analysis defines significance as the top 2% ( P <.02), up to 11,000 findings are potentially false.


Several complex methods are emerging to better refine statistical analysis for multiple testing. One common test is the Bonferroni correction , which posits that the P value of true significance is the usual P value (ie, P = .02) divided by the number of hypotheses being tested ( Fig. 1 ).




Fig. 1


Whole genome SNP association study. The plot displays the degree of association of particular SNPs with SALS: the x -axis displays SNPs chromosomal position and the y -axis shows −log10( P value obtained by allelic association test). The red line represents threshold for significance after Bonferroni correction for multiple testing. ( Data from Schymick J, Scholz SW, Fung H-C et al. Genome-wide genotyping in amyotrophic lateral sclerosis and neurologically normal controls. Lancet Neurol 2007;6(4):322–8.)




Whole genome association studies


With the International HapMap and the ability to represent the genetic variation through tagging SNPs, and the advances of the high throughput microarray technologies, one microchip became able to adequately profile genetic variation across the genome. Suddenly, studies for disease association could be genome-wide through comparing cases with a disease to controls without ( case-control studies ) .


The manner in which whole genome association (WGA) studies ascertain association is as follows. The frequencies of alleles or genotypes (combination of alleles) of each tagging SNP are ascertained in disease and nondisease populations and then statistically compared . A wide variety of statistical tests to assess for association have been described. The so-called “Cochrane Armitage trend test” is a favorite because of its (1) robustness to cryptic relatedness, wherein two samples are related to each other without the knowledge of the investigator (eg, second-degree cousins); (2) deviations from Hardy-Weinberg equilibrium, which are deviations from the frequency with which an allele is in a population when steady-state is established; and (3) ability to factor risk for disease in an additive fashion, accounting for the possibility that a causative allele may have either dominant or recessive actions.


The premise of WGA studies is that disease-predisposing alleles exist with relatively high frequency. This hypothesis is often referred to as the common disease/common variant hypothesis . The recent explosion in the number of published whole genome studies has also made it clear that causative genetic variants typically confer only minor to moderate risk for disease. Many associations are finding odds ratios between 1.1 and 1.4, meaning that an individual’s risk for disease is increased by 10% to 40%, which seems minor for rare diseases.


Not all diseases are yielding clear results, either because of low association to disease susceptibility or because the complexity of the association, if even genetic, is created through many alleles of such low frequency that they elude detection, a hypothesis referred to as multiple rare variants hypothesis .


Whole genome association studies of sporadic amyotrophic lateral sclerosis


Four WGA studies have been published for SALS . The first study consisted of a cohort of 276 American patients who had SALS and 275 neurologically normal American control samples . Although this study identified several loci that may be important in the pathogenesis of motor neuron degeneration, none exceeded Bonferroni correction (see Fig. 1 ).


The second WGA study used a technique that pooled DNA to analyze a cohort of 386 patients who had SALS and 542 neurologically normal controls . This study implicated the FLJ10986 gene on chromosome 1 as being associated with increased risk for ALS ( P value in replication series = 3.0 × 10 −4 ; odds ratio = 1.35).


The third study, involving 461 Dutch patients who had SALS and 450 controls, identified several SNPs of interest (including one on the ITPR2 gene on chromosome 12), although none met Bonferroni threshold . However, a separate but related study from this same group identified dipeptidylpeptidase 6 (DPP6) on chromosome 7 as a risk allele ( P value = 3.28 × 10 −6 and 5.04 × 10 −8 ) .


A fourth study involving 221 Irish patients who had ALS and 211 Irish controls identified several loci of possible importance, but none exceeded Bonferroni threshold. However, pooled analysis also identified DPP6 (combined P value = 2.53 × 10 −6 ), which seems to increase risk for ALS by 37% .


Limitations


The WGA studies have two main problems . First, with the possible exception of DPP6, no loci identified in individual studies have replicated each other. Second, the observed associations seem modest. Large cohorts of several thousand cases and controls are required to have sufficient power to discriminate moderate-effect alleles, and therefore each study was underpowered in the size of the initial cohorts . Efforts are underway to pool data from the various WGA studies; this combined cohort of several thousand cases and controls may have sufficient power to identify moderate effect alleles.




Whole genome expression profiling


Expression and expression arrays


In contrast to genetic association studies, which examine sites in DNA for genetic associations to disease, gene expression studies explore transcribed mRNA for gene expression in disease. These studies are therefore more rooted in biology than genetics and are sometimes referred to as functional genomics .


The basis of these studies is expression microarray technology . Expression microarrays capitalize on the fact that the 3′ tail of the 3′ exon is polyadenylated, and therefore can serve as a tag with which to identify expressed genes. Molecular techniques are used that prime with poly-dT sequences complementary to the poly-dA tails and are thus able to separate mRNA from non-mRNA (only 1%–3% of total RNA is mRNA), amplify the mRNA, and label it .


These studies use two kinds of microarray platforms. In one, called cDNA or spotted microarrays , the probes for detecting genes are maintained from in vivo stock and spotted onto the array. In the other, called oligonucleotide microarrays , the probes are computer designed and synthesized using laser technology. This platform has the advantage of being whole genome.


Interpreting expression data is complex. Statistical methods define genes that are differentially expressed and which can then be analyzed for biologic meaning ( Fig. 2 ) . Newer methods seek enrichment of biologic processes . Rather than differential levels of expression of any one gene, these methods seek differential expression of sets or networks of genes that work together ( Fig. 3 ). One challenge to data interpretation is that gene function annotation is at a relatively early stage of development and is skewed in the direction of current greatest research.




Fig. 2


Differential gene expression. The plots show gene expression in motor neurons and anterior horns in lumbar spinal cords of SALS. The motor neurons were isolated from the surrounding anterior horn using a technique called laser capture microdissection , which allows histologic sections to be dissected with a specialized microscope. Gene expressions were performed using a technology called whole genome oligonucleotide microarray , which measure expression of each gene. The top plots compare anterior horns to motor neurons in control ( left ) and SALS ( right ); the bottom plots compare SALS to control in the neurons ( left ) and anterior horn ( right .) ( Data from John Ravits, MD, unpublished.)



Fig. 3


Differential gene set expression. The plots show the neuron migration pathway of 54 genes that seems to be up-regulated in SALS motor neurons. The left plot shows what is called a heat map , which is a way of visualizing patterns. In the example, the groups of interest are in columns (control on the left and SALS on the right) and the genes in the pathway of interest are in rows (the top rows are the most differentiated between the groups). The right had plot shows the overall up-regulation (shift to the right) in the expression levels of these genes. ( Data from John Ravits, MD, unpublished.)


Cell-specific gene expression


Microarray technologies can be used with laser microdissection, a computer-assisted microscopic technique that microdissects tissue to isolate specific cells for molecular analysis . Through microdissecting and pooling a single-cell population, mRNA from the relevant cells residing in tissues with complex architecture, such as the nervous system, can be isolated and pooled .


This technique is especially useful for overcoming some main difficulties in investigating sporadic neurodegenerative diseases, such as the selectivity of the pathologic process; the variable location of pathology along the neuraxis; the reduction caused by the disease; and the low pathogenic-to-non-pathogenic signal-to-noise ratio.


ALS lends itself to study with this technique. Clinically, ALS motor neuron degeneration begins in a discrete region of the motor system and propagates outward and summates over space and time until it appears to be diffuse . The outward propagation of the disease from a focus creates a gradient of neurodegeneration around the site of onset, which, because of the direct involvement of respiration, is usually still active and present at death . This finding can be exploited, especially when applied to regions in early stages of degeneration with relatively early molecular events .


Limitations


This technique has problems and cautions. Expression profiling depends on the quality of the input, underscoring the critical importance of upstream tissue processing. Expression profiles may be incomplete and only represent 75% of a cell’s expressed genes, therefore causing false-negatives (ie, mistakenly thinking that a gene that is expressed is not) . The genes identified represent the summation of the captured neurons; even though the neuronal population is homogeneous, the cells are at various stages of development or disease and the gene expression profile is a summation of this. As with all research such as this, molecular discoveries do not differentiate between primary and secondary changes. Even if primary gene expression changes can be defined, the initial causative factors initiating this expression remain to be defined. Furthermore, the relationship between genomics and proteomics is unknown; what occurs at the gene level may not accurately reflect what is happening at the protein level .


Expression studies in amyotrophic lateral sclerosis


Several investigations have used either tissue microdissection or microarray technologies for ALS, but only one has combined them in human SALS , and three have combined them in transgenic models ( Tables 1 and 2 ) .


Apr 19, 2017 | Posted by in PHYSICAL MEDICINE & REHABILITATION | Comments Off on Current and Future Directions in Genomics of Amyotrophic Lateral Sclerosis

Full access? Get Clinical Tree

Get Clinical Tree app for offline access