5
Multiple Sclerosis Genetics
Bruce A. C. Cree
KEY POINTS FOR CLINICIANS
• Family aggregation and twin studies indicate that heredity contributes to multiple sclerosis (MS) risk.
• The primary MS susceptibility locus is within the major histocompatibility complex and encodes a protein, HLA-DRB1*15, that has a critical function in presenting antigens to T cells.
• Over 200 other genetic loci throughout the genome contribute to MS susceptibility.
• Many of these loci are associated with other autoimmune diseases suggesting sharing of a biological pathway, or pathways, in autoimmunity.
• The majority of identified MS susceptibility loci thus far are noncoding polymorphisms, that is, common genetic alleles also found in healthy individuals that do not directly influence protein structure. These polymorphisms could influence expression of over 500 MS associated genes.
• Genetic variants in the vitamin D enzymatic pathway are associated with MS susceptibility and underscore the importance of vitamin D’s role in MS.
RACE AND GEOGRAPHY
Race and geography are known to influence MS prevalence. This suggests that heritable factors may contribute to MS pathogenesis (1). The risk of MS is much higher in populations of Northern European ancestry than in other ethnic groups residing at the same latitudes (2–5). It follows that this increased susceptibility might be due to genetic differences between ethnic groups. MS is approximately 50% less common in African Americans compared to whites (6,7). MS is still less common in both native Japanese and Japanese Americans (five per 100,000) compared to Northern European populations (100–150 per 100,000) (8). Similarly, MS is relatively less common among Native Americans in both the United States and Canada (9–12). These observations lead to the hypothesis that genetic traits for MS risk may be enriched in certain populations and occur less frequently in others thereby contributing to these racial patterns. However, despite a generally shared environment, ethnic factors that can track with race such as diet might also account for such differences.
FAMILIAL AGGREGATION
Although first described as a sporadic disease, the familial occurrence of MS was recognized in the late 19th century (13,14). Systematic studies of familial aggregation in MS support a genetic contribution to the disease (15–21). These studies found that approximately 15% to 20% of MS patients reported a family history of MS, a proportion that is significantly higher than what would be expected based on the relatively low prevalence of MS in these populations. The ratio of the relative risk of disease in siblings of affected individuals compared to the relative risk of disease in the overall population is referred to as λs (22). For MS, this risk ratio is approximately 15 to 40 indicating a moderately strong familial influence on MS risk (23). To help place the relevance of this value in the context of other heritable complex diseases heritable, the λs ratio for MS is higher than that for schizophrenia (9) similar to that for type 1 diabetes (15) and less than that for autism (59) (24). However, commonly shared environmental factors might also explain such familial aggregation (25).
Systematic studies of familial aggregation in MS support a genetic contribution to the disease.
Twin Studies
Perhaps the most compelling observation indicating that MS susceptibility has a genetic component comes from twin studies that demonstrate concordance rates of approximately 30% in monozygotic twins and 3% to 5% in dizygotic twins (26–31). This rate for fraternal twins is similar to that of first-degree relatives of MS patients. Conjugal pair studies also show that the risk of MS increases substantially if both parents are affected by MS, again implying a heritable component to MS susceptibility (32–34) (Table 5.1). Taken together, familial and population-based studies indicate that some component of MS risk is heritable; however, that the majority of MS patients have no family history indicates either that environmental factors may outweigh genetic risks or that genetic risk can be attributed to the influence of multiple traits that by themselves have low disease penetrance. The concept that MS risk may be inherited as a complex trait rather than following simple Mendelian inheritance patterns, as is the case for recessive mutations such as cystic fibrosis or dominant mutations like Huntington’s disease, is essential for understanding the genetic contributions to MS risk (Figure 5.1) (35).
Perhaps the most compelling observation indicating that MS susceptibility has a genetic component comes from twin studies that demonstrate concordance rates of approximately 30% in monozygotic twins and 3% to 5% in dizygotic twins.
THE FIRST MOLECULAR MARKERS FOR MS: THE HUMAN LEUKOCYTE ANTIGENS
The first studies that identified a link between MS heredity and genetic variation compared human leukocyte antigen (HLA) protein polymorphisms between MS cases and healthy controls. These early studies found that cell surface antigens present on the membranes of peripheral blood mononuclear cells were more frequent in MS patients compared to unaffected controls. The first such antigens were HLA-A3 (36–38), followed by HLA-B7, and then HLA-DRw2 (39–43). These HLA associations were in fact not independent but rather reflected a common shared haplotype as a consequence of linkage disequilibrium. Linkage disequilibrium refers to the observation that alleles of certain neighboring genes tend to be inherited together as a consequence of natural selection, although the driving factors for such selection may be obscure. Thus the molecules HLA-A3, HLA-B7, and HLA-DRw2 are closely associated especially in European descended populations. Further elucidation as to which, or possibly more than one, of these linked genes contributes to MS genetics required development of improved molecular techniques.
LINKAGE ANALYSIS
In the 1980s, new DNA based technology was developed for studying Mendelian patterns of inheritance first with restriction fragment length polymorphisms (RFLPs) followed by microsatellite repeats (44,45). Using RFLPs to dissect the molecular contributions of the HLA locus to MS susceptibility, it became clear that alleles of HLA-DR2 are the major contributors to MS risk (46). These new molecular techniques not only allowed for dissection of linked genes at HLA but also made possible the ability to screen for other associations with MS found at many other positions across the genome. By studying the linkage between an inherited trait and DNA markers in families with some members affected by a heritable disease, it was possible to identify the chromosomal location of disease-causing genes (Figure 5.2). Markers that were physically near the disease-causing gene were likely to be inherited along with the disease trait because recombination between genetic loci occurs less frequently between neighboring genes relative to genes at greater distances. Thus the phenomenon of linkage disequilibrium that confounded efforts to discern between alleles of genes at HLA could be exploited to identify previously unknown disease-related genes. These new techniques were first applied to Mendelian inherited diseases and culminated in identification of many single gene mutations, such as those for cystic fibrosis and Huntington’s disease.
30
Source: Modified from Peltonen L, McKusick VA. Dissecting human disease in the postgenomic era. Science. 2001;291:1224–1229.
Alleles of HLA-DR2 are the major contributors to MS risk.
MS AS A COMPLEX TRAIT
The linkage-based approach had the potential to also be applied not only to single gene disorders but also to complex multigenic traits. However, the hurdles for identifying such traits would be considerably higher because the penetrance of each trait would be far less than for disease causing mutations (47). Penetrance refers to the likelihood that a particular genotype will manifest as a phenotype. For Mendelian inherited mutations such as those for dominant diseases such as Huntington’s disease, the penetrance is very high, meaning that nearly all individuals who carry the disease-causing genotype will develop the disease. However, for complex traits, the penetrance is low and polymorphisms associated with the complex trait could be common in the overall population. This was clearly the case for the HLA locus. None of the HLA alleles associated with MS by themselves are disease-causing mutations. All these alleles were commonly found in healthy controls but were overrepresented among MS patients. The importance of this observation initially was perhaps not fully appreciated. Investigators assumed that because heritable MS risk could be found in families in whom the MS-associated HLA alleles were not carried that other loci, perhaps with even stronger effects than that of HLA, must be present elsewhere in the genome. If this was the case, then systematic study of the genome in families affected by MS would surely identify these other loci.
31
GENOMIC LINKAGE SCREENS
The first series of genome-wide screens using several hundred microsatellite DNA markers across in approximately 100 affected sib pairs were undertaken in the 1990s (48–50). Assuming that other loci in the genome would have had similar effects on MS risk as that of, or even greater than, the major histocompatibility complex (MHC), these studies would have been expected to identify at least a few novel loci. However, no statistically significant additional loci were identified. Furthermore, one of these studies was unable to detect a signal from the MHC (48). Follow-up studies using multiply affected families also failed to detect any convincing new MS susceptibility loci (51–55). Adding more microsatellite markers to the initial genome screens also failed (56–58). Pooling data for meta-analysis similarly failed to identify loci other than the MHC (59,60). It became clear that identification of the effects of genetic variation on MS susceptibility would require not only better markers but also substantially increased numbers of families for statistical power. The way forward required a much larger number of affected families to which any single group had access. The International Multiple Sclerosis Genetics Consortium (IMSGC) was thus founded in 2003 and brought together previously competing investigators in a collaborative effort to decode MS heritability (61) (www.neurodiscovery.harvard.edu/research/imsgc.html).
The first large-scale linkage study with sufficient statistical power to detect loci with effects similar to that of the MHC across the genome in populations from Australia, Scandinavia, the United Kingdom, and the United States identified a definite association between the MHC and MS susceptibility (Figure 5.3) (62). However, other loci whose associations with MS had been proposed from smaller studies were not replicated. This study was an important milestone in the study of MS genetics because for the first time, an adequate number of markers and MS-affected families were brought together through an international collaborative effort. Furthermore, the markers used were sufficiently numerous and evenly spaced across the genome that there was confidence that the majority of the genome was adequately represented for linkage analysis. Perhaps most importantly, 730 families were studied, thus providing adequate power to detect genetic effects that increased the odds of MS risk by more than twofold. Only MHC was found to increase risk of MS, which indicated that other possible loci that might influence MS risk must have more modest individual effects. Identification of such loci was effectively not possible using linkage analysis unless tens of thousands of families were analyzed. (63) This inherent limitation of linkage methodology potentially could be overcome by a different approach genetic analysis: the genome-wide association screen (GWAS).
32
MS, multiple sclerosis.
Source: From Sawcer S, Ban M, Maranian M, et al. A high-density screen for linkage in multiple sclerosis. Am J Hum Genet. 2005;77(3):454–467.
Genome-Wide Association Screen
Further technological innovation led to identification of single nucleotide polymorphisms (SNPs). SNPs are genetic variants that occur at a single base pair position within the genome (Figure 5.4). The remarkable achievement of sequencing of the human genome in conjunction with mapping hundreds of thousands of these SNP variants led to the realization that 99.9% of the human genome is invariant (64,65). Nevertheless, there are still billions of genetic variations, many of which are exceedingly rare whereas others are more common. By focusing on the variants that are more commonly found, for example, SNP alleles that are present in at least 5% of the overall population, it would be possible to map traits that are linked to common SNP variants. By coupling such potentially informative SNP variants with microchip-based miniaturization, it became possible to map hundreds of thousands of SNP variants from thousands of individuals (Figure 5.4). If heritable traits such as MS susceptibility are linked to commonly identified variants, then genotyping these common SNP variants could map the loci of the heritable trait. This hypothesis is referred to as the common disease–common variant hypothesis.
Unlike linkage analysis that required relatively large effect sizes for tracking heritable traits in families, the newer SNP-based technology was capable of detecting smaller individual genetic effects by increasing the numbers of affected individuals and unaffected controls. With a sufficiently large enough number of samples, association testing that compares the prevalence of any given SNP marker between two populations has the capability to detect modest or small genetic effects as long as the numbers of affected and unaffected individuals are sufficient. Moreover, the samples for association screens did not necessarily require DNA from family members because although family structure could be taken into account in GWAS statistics, it did not have to be taken into account. In essence, this approach simply compares the prevalence of any given SNP marker in cases and controls, which is similar to a chi-square statistic. As long as the controls are from the same genetic background as the cases, then statistically significant differences in the prevalence of a particular SNP allele would presumably be due to a disease-related trait.
33
SNP, single nucleotide polymorphisms.
GWAS Identifies the First Genes Outside the MHC
MS was one of the first diseases to be studied using this new GWAS technique. The IMSGC conducted the first GWAS in 2007 using over 334,923 SNPs in 930 MS trio families (a trio family is a MS patient and both parents) with a replication of datasets consisting of another 609 family trios and an additional 2,322 case subjects and 789 unrelated controls (66). It was hoped that this massive and costly effort would finally determine the genetic architecture of MS especially in regard to the much sought after non-MHC contributions. As anticipated, the MHC was definitively associated with MS susceptibility; however, beyond the MHC only two other loci were identified with a statistically significant level of confidence. These loci encoded genes involved in immune regulation: the interleukin 2 receptor (IL2Rα) and the interleukin 7 receptor (IL7Rα). Associations with MS susceptibility for both loci were subsequently validated in other populations (67–72).
Alleles of the interleukin 2 and the interleukin 7 receptors were the first non–major MHC loci that were definitely associated with MS risk.
The landmark achievement of identifying two non-HLA loci established that genes outside the MHC contributed to MS susceptibility. However, variations at these alleles, along with those of the MHC, could not account for all MS heritability. Moreover, the alleles identified were, by definition, common alleles (the SNPs genotyped from most GWAS have minor allele frequencies [MAFs] of at least 5% meaning that the SNPs are present in at least 5% of the population). However, the commonality of MS susceptibility–associated SNPs in both MS cases and controls were surprising. For example, the IL2Rα variant was present in 85% of controls and 88% of cases. IL7Rα variant results were similar: the MS-associated variant was present in 78% of MS patients and 75% of controls. The presence of these variants in the majority of controls had two very important implications. First, these variants are not mutations but instead are the most common polymorphisms of each receptor. Therefore, the consequence to the protein associated with the polymorphism is normal function as opposed to loss of function or gain of an abnormal function associated with either recessive or dominant mutations, respectively. Second, because there was only a very slight overrepresentation of these polymorphisms in MS cases, the effect that this polymorphism has on MS risk is miniscule. Indeed, the odds ratios for these alleles were less than 1.5. If other non-HLA MS risk alleles were linked to common SNP variants, then sample size calculations showed that variants associated with a 1.1 fold or higher odds of MS risk would require at least 10,000 MS cases and a similar number of controls (47,73).
34Although this GWAS identified only two non-HLA loci with a genome wide level of statistical significance, there may be many other loci associated with MS susceptibility that missed the statistical cutoff for definite association. GWAS performed by other groups, as well as meta-analyses that combined GWAS data from different studies, identified multiple other MS susceptibility loci (74–82).
In order to expand the statistical power needed for the next round of GWAS, the IMSGC expanded its membership ultimately involving 23 research groups from 15 countries. The IMSGC also partnered with the Welcome Trust Case Control Consortium 2 (WTCCC2) to make use of the most up-to-date GWAS technology (83). Ultimately, 9,772 MS cases and 17,376 control DNA samples passed stringent quality control assessments. 441,547 autosomal SNPs were genotyped in this massive dataset. During analysis, it became clear that an important problem might bias the study: population stratification. Because this GWAS did not use a family based approach, the comparison of cases to controls was predicated on the assumption that cases and controls shared a common genomic structure except at the MS susceptibility loci. However, if cases and controls had somewhat different genomic structures due to differential sampling, then the differences identified between cases and controls could be due to either disease-causing loci or irrelevant differences in genomic structure introduced by sampling bias. When cases and controls from a single country, such as the United Kingdom, were compared, there was no evidence of population stratification. However, because cases and controls were not perfectly matched by country of origin, the entire dataset showed evidence of genomic inflation, meaning that because cases and controls were not perfectly matched there was a systematic difference for genomic markers between these two groups that would bias the GWAS results due to population stratification. Several methods to control for genomic inflation were employed but ultimately a novel approach (variance component method) was able to effectively adjust for the genomic inflation bias.
The IMSGC and WTCCC2’s MS GWAS identified 52 loci that were definitively associated with MS susceptibility. This study not only replicated the known MHC, IL2Rα, and IL2Rα associations but also found 20 loci that had been implicated in MS risk through other GWAS studies as well as meta-analyses. Furthermore, 29 novel loci were identified. All non-MHC loci had minor influences on MS susceptibility with odds ratios ranging from 1.07 to 1.21. Perhaps the most important observation from this study was that the majority of SNPs identified were located near genes encoding immune functions. This observation supported the hypothesis that MS is indeed an autoimmune disease. Furthermore, many of the implicated genes share common pathways involved in immune regulation, providing important clues as to how normal immune function might become dysregulated in MS. Moreover, 23 of the identified loci are known to be involved in other autoimmune diseases indicating that common mechanisms, at least in part, underlie autoimmune diseases. However, the identification of these common loci did not lead immediately to an understanding as to why the central nervous system (CNS) is the primary target of autoimmune injury, although several possible genes are expressed in the CNS and some, such as GALC, encode proteins that were previously implicated in MS (84).
The IMSGC and WTCCC2 MS GWAS identified 52 loci that were definitively associated with MS susceptibility.
In the most recent iteration of the IMSGC’s efforts to create a complete genomic map of MS, data from 15 separate GWAS were integrated into a single discovery dataset consisting of 14,802 subjects and 26,703 controls (85). Data from the 1000 Genomes European panel was used to impute 8.6 million SNPs with MAF of at least 1%. An additional dataset consisting of 20,822 MS subjects and 18,956 controls was genotyped using a custom designed MS chip of some 4,842 non-MHC SNPs identified in the discovery cohort. A second replication cohort of 12,267 MS subjects and 22,625 controls was used for validation. This effort yielded identification of 200 non-MHC loci with genome-wide statistically significant associations that contribute to MS susceptibility with odds ratios between 1.06 and 2.06. Interestingly, the risk allele frequencies ranged from 2.1% to 98.4% in the European population. MS-associated SNPs were identified on every chromosome except for the Y chromosome. This study also found the first convincing X-linked locus.
Nearly all of the genetic loci map to noncoding areas of the genome, meaning that the polymorphisms are not associated with structural changes to an expressed protein’s primary sequence. Some SNPs are intragenic and therefore could relate to splicing, for example, producing different ratios of splice protein products. However, the majority of the MS-associated SNPs are intergenic meaning that they are not immediately found near any known genes. These areas may contain transcriptional regulatory elements such as promoters or enhancers, open chromatin, histone modification sites, or even transcribed regulatory ribonucleic acids (RNAs) that are not translated into proteins. The recent ENCODE project’s remarkable discovery that 80% of the human genome contains elements linked to biological processes underscores that DNA regions without open reading frames can be biologically important and in fact do not contain what previously had been disregarded as “junk” DNA (86,87).
The observation that the majority of the MS genomic map involves noncoding areas suggests that MS susceptibility may be a disorder of networks involved in gene 35regulation rather than structural alteration in gene products. If this is the case, then understanding MS risk will need to be refined by revealing the different patterns of gene expression between MS patients and unaffected controls in relevant cell types. As a first effort to understand the influence of these loci on MS susceptibility atlases of gene expression based on cell type were consulted for expression of mRNAs linked to the MS-associated loci. In this data mining study, gene expression profiles from various available cell types were analyzed for expression of mRNAs that are genetically linked to the MS-associated SNPs. Not surprisingly, significant enrichment of mRNAs linked to MS-associated loci was present in cells of the adaptive immune system (both T cells and B cells). Interestingly, cells of the innate immune system such as natural killer cells and dendritic cells also showed enrichment of mRNAs linked to the MS-associated loci, underscoring a potential role of both adaptive and innate immunity in MS susceptibility. Thymic tissue also showed upregulation of gene transcripts potentially linked to MS susceptibility loci. Data generated from examining the expression profiles of differentiated neuronal lineage cells derived human induced pluripotent stem cells as well as purified primary human microglia and astrocytes found enrichment of mRNAs from MS-associated loci in microglial cells. This observation suggests that gene regulation within microglia, the CNS’s resident immune cells, might also influence MS susceptibility. Taken together these data suggest that MS susceptibility loci possibly could alter gene expression profiles in diverse cells of the peripheral adaptive and innate immune systems as well as microglia, the resident innate immune cells of the nervous system.
With 200 loci potentially influencing the expression of an even larger number of neighboring message and or regulatory RNAs that could have both cis and trans effects on expression patterns of other RNAs, understanding the genetic basis for MS susceptibility will require a completely different way of thinking about genetics than what is traditionally associated with Mendelian disorders. To begin to understand whether the MS-associated SNPs can alter gene expression, gene expression profiles from peripheral blood mononuclear cells (PBMCs) from MS patients and unaffected controls were analyzed to determine whether the MS-associated SNPs were associated with alterations in mRNA transcripts. In this preliminary experiment in MS PBMCs, 30% of the 200 non-MHC MS associated SNPs were found to be cis-acting expression quantitative trait loci (cis-eQTLs) for 92 genes. This experiment provides primary evidence for the hypothesis that genetic basis for MS susceptibility acts through altering gene expression in at least some cells of the peripheral immune system (in this experiment CD4 naïve T cells and monocytes).
That genetic MS-related loci are involved in gene regulation in lymphoid cells comes as little surprise given the overwhelming evidence indicating that MS is an immune-mediated disorder. However, why the brain and spinal cord are selectively targeted in MS remains unknown. One hope of genetic studies is to elucidate why the CNS is apparently targeted by the immune system. To this end, an analysis of eQTLs from brain-derived tissues was also undertaken to see if the alterations in gene expression in the target organ contribute to MS susceptibility. Here the results were less clear with some MS loci potentially influencing expression of neuron-associated transcripts but also simultaneously potentially influencing expression of B cell transcripts. This observation illustrates a challenge with this type of analysis: identification of disease-associated SNPs is not synonymous with identification of the causative problem. It is important to understand that for the majority of the identified loci, multiple neighboring genes are also linked to the MS-associated SNP. Therefore, with the current level of resolution of GWAS, the exact genetic variant involved in MS susceptibility cannot be determined. Although it is possible that the MS-associated variants are the SNPs identified by the GWAS, it is also possible that the identified SNPs are in linkage disequilibrium with the true MS-associated alleles. Additional SNPs or resequencing of regions of interest will be necessary to refine the map and identify the causal variant.
Although the expression profile analysis of brain tissue did not clearly identify a pattern of CNS expressed genes that explain why the brain and spinal cord are targeted in MS, this analysis provided another line of evidence implicating microglial in MS susceptibility. A previously identified MS susceptibility gene CLECL1 was expressed at low levels in cortical tissue. However, this transcript is found in microglia that compose only a small amount of cortical tissue. In purified microglia, CLECL1 is expressed at levels 20-fold higher than in cortical slices. That this MS susceptibility gene is expressed in microglial cells provides another line of evidence indicating that these CNS cells could participate in in MS susceptibility.
That, there are a 200 loci associated with MS risk and that many of these loci could potentially modulate gene expression of a large number of genes poses a major challenge for understanding the heritable aspects of MS. One strategy to try to integrate the function of the many genes plausibly involved in MS is to cluster the genes by established function within established cellular pathways. This approach leverages bioinformatics to help reduce a large amount of complex data into gene/protein networks that functionally linked through canonical prior knowledge and subsequently structured in the form of pathways diagrams. A prioritized list of 551 genes were selected based on the eQTL data described earlier, along with genes that had at least one exonic variant, genes that have high regulatory potential and genes that exhibit similar tissue-specific coexpression patterns. Additional modeling efforts exploring potential protein–protein interactions found that about 1/3 of the 551 prioritized genes were connected and could be organized into 13 communities or subnetworks that have higher levels of connectivity. These in silico modeling efforts found that many of the potential MS-related genes map to pathways with known functions in immune cells including processes involved in lymphoid development, 36maturation, and differentiation. However, neurons and astroglial cells repurpose at least some of the genes involved in immune signaling such as tumor necrosis factor alpha, ciliary neurotrophic factor, nerve growth factor, and neuregulin, leading to an interesting and potentially important ambiguity as to in which tissue or tissues these genes exerts their effect in promoting MS susceptibility. Much work is needed to further understand the contextual basis in which MS susceptibility genes exert their influence.
The analysis of MS GWAS data has not only lead to the identification of a large number of genetic loci involved in MS susceptibility but also fundamentally changed the way in which MS heritability is conceived. Simple Mendelian concepts of heritability or even quantitative traits arising from multiple loci (polygenic) apply only in part to the complex genomics at work in MS. With 200 loci across the genome and more than 500 potential genes involved in multiple biological pathways, understanding the genetic basis of MS susceptibility requires development of new ways to understand how alterations in transcriptional levels of multiple genes influence molecular functions within cell types. Many other human complex diseases share similar genomic features. For example, Mendelian alleles that have large effect sizes do not contribute to Crohn’s disease, rheumatoid arthritis, or schizophrenia. In each of these disease states, GWAS studies have found multiple risk alleles that individually contribute only fractionally to genetic risk but in aggregate appear to influence the disease phenotype through effects on cellular regulatory networks. In this “omnigenic” model of heritability, alterations in core genes that have biologically interpretable roles in a disease contribute only partially to the disease trait. Most of the susceptibility arises from alterations of secondary genes that are interconnected to the core genes through networks (88). In the case of MS, the “core” genes are likely composed of the antigen presenting genes within the MHC associated with MS risk nearly 50 years ago. These genes exert the strongest individual effects in terms of MS risk. Secondary loci, which by themselves contribute only minute fractional risks, exert a greater overall impact on MS susceptibility by virtue of the shear number of genes involved in secondary pathways that indirectly connect back to the core genetic pathway.
MISSING HERITABILITY
Despite the remarkable achievement of the IMSGC, the estimate of the total contribution to MS heritability by these, the MHC and non-MHC 200 loci is only 39% (48% if suggestive effects that did not meet genome wide levels of statistical significance in the replication datasets are included). Given that the MHC itself accounts for 20% of MS heritability, the total contribution of the other 200 genetic loci to MS risk is only ~20%. This suggests that 50% to 60% of MS genetic risk will be accounted for by variants that cannot be identified using SNP chips designed to test the common allele–common variant hypothesis. Identification of rare disease causative alleles that have individually weak or modest effects poses additional challenges for genetic analysis. First, the number of potential rare variants is much greater than the number of common variants. Second, the majority of rare variants have not yet been described in publicly available databases. Identification and cataloguing these rare variants will require sequencing many more genomes. Finally, optimal methods for typing an individual’s DNA for rare variants are still being developed.