Rheumatoid arthritis (RA) is the most common rheumatic disease. The genetic basis of RA is supported through the identification of more than 30 susceptibility genetic variants. Each of these genes individually makes only a slight contribution to the risk of disease. Moreover, there is significant disparity in the genetic variants associated with different RA subgroups and patient ethnicities, which emphasizes the intricate nature of the disease’s pathogenesis, and the complexities involved in large-scale genetic studies. This review evaluates critically the recent literature on the genetic contribution to RA and assesses the methodology used to identify these risk alleles.
- •
More than 30 susceptibility gene variants associated with RA were identified, individually most making only slight contribution to the risk of disease.
- •
Gene variants within the HLA locus account for 30% to 50% of overall genetic susceptibility to RA.
- •
Three main approaches have been used to identify non-HLA susceptibility loci: classical family linkage studies, case-control candidate gene studies, and genome-wide association studies.
Introduction
In 1913, at the age of 72 years, having produced numerous masterpieces, the great impressionist painter Pierre-Auguste Renoir declared, “I am just learning how to paint.” Although suffering from severe rheumatoid arthritis (RA), Renoir went on to produce additional paintings, each subtly beautiful and profound. One might imagine Renoir was hinting that given the advanced stage of his disease, with the ability to paint nimbly across the canvas now constrained by tormented joints, every stroke of his brush had to be planned, deliberately focused, and refined. And with these brush-strokes one might say that Renoir’s connection to RA goes beyond his diagnosis. As opposed to previous classical paintings that consisted of subjects illustrated with contiguous hard outlines and smooth paint surfaces, Renoir’s paintings depict vibrant figures comprising numerous small brush strokes. When the canvas and individual strokes are viewed up-close and analyzed independently of the rest of the painting, the image is blurry and meaningless ( Fig. 1 A). When viewed from a distance, however, it becomes apparent that distinct strokes each contribute to a great masterwork depicting, for example, a hidden path emerging through the woods (see Fig. 1 B). Decades of research in RA have also seen a path emerge in the disease’s complex pathogenesis and mounting evidence has been accumulated to support the genetic basis of RA through the identification of more than 30 susceptibility genetic variants. Thus far, however,each of these genes individually makes only a slight impression on the picture of disease, much like the brush-strokes on the canvas of an impressionist painting. As an artistic movement, impressionism also sought to reassess the previously accepted methods in classical painting, focusing on the evaluation of novel techniques and subjects to create vivid works of art. From this perspective, the aims of the following article are 2-fold: (1) a critical evaluation of the recent literature on the genetic contribution to RA pathogenesis, and (2) an assessment of the methodology used in the studies from which these data are derived.
The landscape
RA is the most common rheumatic disease with a prevalence of 0.5% to 1% in the general population worldwide. Among siblings, the prevalence increases to 2% to 4%. In monozygotic twins, the concordance rate for RA is between 12.3% and 15.4% compared with 3.5% for dizygotic twins. These sibling and twin pair studies demonstrate that genetic factors substantially affect RA susceptibility, resulting in an estimated genetic contribution to RA of approximately 50% to 60%. The relatively low concordance rates between monozygotic twins; however, emphasize the importance of environmental factors in RA susceptibility. Among them, long-term smoking remains the only validated environmental factor that contributes to an increased risk of developing seropositive RA.
As is the case for other autoimmune diseases, there is growing awareness that RA is not a single disease entity, but rather can be divided into distinct subphenotypes that have disparate clinical outcomes. Such a classification is achieved based on the serologic traits, such as the rheumatoid factor (RF) and anti-citrullinated protein antibodies (ACPA). RF is an autoantibody against the Fc portion of IgG, whereas ACPA are autoantibodies against citrullinated proteins that are formed by the conversion of arginine into citrullin by peptidylarginine deiminase (PADI). Citrullination is a physiologic process that can occur under different conditions including inflammation. Although RF is not unique to RA, ACPA are highly specific for the disease. Indeed, 75% of subjects who present with undifferentiated arthritis and ACPA progress to RA within 3 years of follow-up. The subdivision of RA into ACPA-positive and ACPA-negative subtypes is useful because these distinct groups of patients often differ clinically, with ACPA-positive patients with RA suffering a more aggressive clinical course, higher rates and severity of erosions, and lower rates of remission. Furthermore, they also vary with regard to the genetic risk factors that contribute to their development.
Thus far, more genetic risk alleles have been described for ACPA-positive RA compared with ACPA-negative RA, but this does not imply that genetic factors contribute more to the former than the latter. The heritability of RA among twin pairs for ACPA-positive and ACPA-negative disease was found to be 68% and 66%, respectively, which suggests that the heritability of both serotypes of RA is roughly equivalent. The relative lack of identified risk alleles in ACPA-negative disease might be because most studies to date relied solely on cohorts of ACPA-positive patients with RA.
A dominant theme: the HLA risk factor
The most statistically significant genetic contributor to RA is the HLA locus, which accounts for 30% to 50% of the overall genetic susceptibility to RA. Within the HLA, a group of alleles that encode the HLA-class II DRβ chain (HLA-DRB1) seem to be associated most strongly with predisposition to RA. Using HLA-DRB1 genotyping, multiple studies have described a significant association of 8 particular DRB1 alleles with RA: DRB*0401, 0404, 0405, 0408, 0101, 0102, 1001, and DRB*09, with DRB*0401 (odds ratio [OR] = 3.30) and DRB*0405 (OR = 3.84) showing the strongest association. Positions 70 to 74 in the third hypervariable region of the DRβ1 chain of the RA-associated HLA-DRB1 alleles all contain the conserved amino acids QKRAA, QRRAA, or RRRAA. This sequence of amino acids is called the shared epitope (SE), and the risk alleles carrying this sequence are widely known as SE alleles. An individual who carries 2 copies of DRB1 SE alleles increases his/her chances of developing RA with an increase in OR to 11.97. This association between SE-encoding HLA-DRB1 alleles and RA was, however, observed only for ACPA-positive disease.
Despite much progress in understanding the structure and function of HLA-DRB1 molecules, the underlying mechanism by which particular HLA-DRB1 alleles predispose to the development of ACPA-positive RA remains uncertain. The HLA-DR molecule is a heterodimer consisting of an α (DRA) and a β chain (DRB), both anchored in the membrane of antigen-presenting cells. The function of HLA-DR molecules is to present antigenic peptides to T lymphocytes. For efficient antigen presentation to T cells, the T-cell receptor recognizes residues from both the peptide and the HLA-DR molecule itself. The part of HLA-DR that binds to the peptide, denoted as the peptide-binding groove, comprises 2 α-helical walls and a floor of β-pleated sheets. The SE is situated in the α-helix wall of the peptide-binding groove. In this position, the SE may influence both peptide binding to the HLA molecule and T cell presentation. The SE motif itself is directly involved in the pathogenesis of RA by allowing the presentation of an arthritogenic peptide to T cells. However, to date, no specific arthritogenic peptides that bind to SE DR molecules have been identified to confirm this hypothesis. Citrullinated peptides, can bind to HLA-SE molecules for presentation to T cells, which alludes to the direct pathogenic involvement of ACPA in RA. Alternative hypotheses to explain the contribution of HLA-SE to RA have been also proposed: the HLA-SE molecules may contribute to RA pathogenesis by shaping the T cell repertoire to permit escape from negative selection and promote survival of autoreactive clones. Furthermore, SE molecules may serve as targets for autoreactive T cells because of the molecular mimicry with a pathogen. Although much indirect evidence to support molecular mimicry in RA (as in other autoimmune processes) has been offered, the hypothesis has not been confirmed directly.
Other studies have suggested additional independent associations to RA within the HLA gene in addition to that at HLA-DRB1. However, pinpointing the associated loci has been challenging, in part because of the complexity of complete HLA genotyping and the broad linkage disequilibrium (LD) across the HLA locus. By using genotype imputation to generate very large data sets from previous studies, Raychaudhuri and colleagues refined the association between the HLA region and RA. The investigators used existing genome-wide association studies (GWAS) data sets from 5018 ACPA-positive individuals with RA and 14,974 controls from independent studies. They then used a large reference panel of 2767 individuals to impute classical HLA alleles, single nucleotide polymorphisms (SNPs), and amino acids across the entire HLA region. These imputed data made it possible to use a much larger sample size than would have been possible with classical typing in resolving specific HLA signals among these very highly correlated variants. The investigators subsequently used conditional analyses in an effort to pinpoint the causal variants. Although the association P values at the top signals in the HLA were so infinitesimal as to make comparisons between raw results meaningless, the value of the study became apparent when testing each signal as conditional on the others. For example, amino acid positions 11 and 13 in HLA-DRβ1 (which are encoded by a locus with very high LD) had association P values of 1 × 10 −581 and 1 × 10 −574 , respectively. However, the conditional analyses for these 2 amino acids showed that, whereas conditioning on amino acid 11 explained the association at amino acid 13 (residual P = .57), the reverse was not true (residual P = 3.5 × 10 −8 ). Based on multiple series of conditional analyses, the entire association within the HLA could be explained by 5 independent polymorphic sites in 3 HLA molecules. Two of these 5 residues were within the SE (amino acids 71 and 74). An important causal mutation was found outside the SE at the base of the DRB1 antigen-binding groove at position 11. Amino acids at the base of the groove of the HLA-B and HLA-DPB1 were also found to modulate causally RA risk. In addition to fine mapping the HLA-class II risk loci, the study resurrects the importance of HLA-class I molecules for RA risk, providing genetic evidence that implicates cytotoxic T cells in RA pathogenesis.
In contrast, the HLA association with ACPA-negative RA is clearly different because HLA-DR4 alleles are not associated. ACPA-negative disease has been associated with the HLA-DRB1*0301 allele in European populations and the HLA-DRB1*0901 allele in a Japanese population. Also, the non-SE DRB1 alleles DRB1*13 and DRB1*03, in combination, have been strongly associated. DRB1*1301 actually confers protection from RA risk in ACPA-positive individuals, possibly by neutralizing the effects of the SE alleles.
Brush-strokes: non-HLA risk factors in ACPA-positive RA
Three main approaches have been used to identify non-HLA susceptibility loci: classical family linkage studies, case-control candidate gene studies, and GWAS. Associations of new individual genes were discovered by testing candidate genes within genomic regions linked to RA susceptibility in previous family linkage studies, analyses of biologic pathways involved in known RA-associated risk loci, or by testing genes known to be associated with other autoimmune diseases. Although these methods of testing candidate genes yielded important information, they have been criticized for potentially limiting their focus to previously identified regions/pathways rather than exploring new avenues in disease pathogenesis. In contrast, because newer methods involving GWAS do not rely on previous data as a starting point, they potentially overcome these limitations. GWAS are wrought with their own problems, however, namely, reductions in statistical power due to multiple testing, which are discussed subsequently. To potentially overcome these limitations, the latest trend is the use of meta-analyses from multiple GWAS, resulting in a very large (>10,000) number of subjects and thus increased statistical power.
Regarding associations discovered via candidate gene studies, the PTPN22 gene is recognized as the second most important risk loci (after the HLA) in populations of European descent with ACPA-positive RA. PTPN22 encodes the lymphoid-specific tyrosine phosphatase, Lyp, which is a negative regulator of T-cell antigen receptor signal transduction during T-cell activation. The associated risk allele is a nonsynonymous SNP (rs2476601, 1858 C > T) that encodes an arginine to tryptophan substitution at residue 620 (R620W) in the polypeptide chain, thereby disrupting the binding to C-src tyrosine kinase. There is evidence that this change confers a gain-of-function mutation to the PTPN22 protein, with the 620W variant enhancing the inhibitory effect on T-cell receptor signaling during thymic development, resulting in the survival of potentially autoreactive T cells. No RA association with PTPN22 could be demonstrated in Asian RA populations.
Among Asian populations with ACPA-positive RA, the second largest genetic risk factor is PADI4 . The PADI4 gene encodes a peptidylarginine deiminase enzyme that converts arginine residues to citrulline posttranscriptionally. Therefore, PADI4 may play a significant role in the development of ACPA by influencing protein citrullination. Together with the finding that ACPA can induce and aggravate arthritis in mouse models, these results suggest that ACPA are actually involved in human disease pathogenesis, as opposed to just serving as a serologic marker for RA subtype classification. With the exception of the PTPN22 gene, other non-HLA RA risk factors identified (mainly by GWAS) have a very modest effect size. Table 1 illustrates all identified, and subsequently validated, risk factors for RA with their heritability estimates.
Genetic Risk Factor | Chromosomal Location | Heritability est (OR) | Association with Other Autoimmune Diseases | References |
---|---|---|---|---|
PTPN22 | 1p13 | 1.94 | GD, HT, MG, SLE, DM1, SSc, UC | |
TNFRSF14 | 1p36 | 1.12 | ||
CD2, CD58 | 1p13 | 1.13 | MS | |
FCGR2A | 1q23 | 1.13 | SLE | |
PTPRC | 1q31 | 1.14 | ||
REL | 2p16 | 1.13 | ||
AFF3 | 2q11 | 1.12 | ||
STAT4 | 2q32 | 1.16 | SLE, GD, DM1, UC, CD, SS | |
CD28 | 2q33 | 1.12 | DM1 | |
CTLA4 | 2q33 | 1.11 | HT, SS, DM1 | |
IL-2, IL-21 | 4q27 | 1.09 | DM1 | |
PRDM1 | 6q21 | 1.1 | CD | |
TNFAIP3 | 6q23 | 1.4 | JIA, PsA, SLE, SSc, DM1 | |
TAGAP | 6q25 | 1.1 | DM1, CD | |
BLK | 8p23 | 1.12 | ||
CCL21 | 9p13 | 1.13 | ||
TRAF-1, C5 | 9q33 | 1.13 | JIA, SLE | |
IL-2RA | 10p15 | 0.92 | JIA, MS, DM1 | |
PRKCQ | 10p15 | 1.14 | ||
TRAF6 | 11p12 | 0.88 | ||
KIF5A, PIP4K2C | 12q13 | 1.12 | ||
CD40 | 20q13 | 1.11 | GD, MS | |
IL-2RB | 22q12 | 1.09 | ||
SPRED2 | 2p14 | 1.13 | ||
ANKRD55, IL-6ST | 5q11 | 1.23 | ||
C5orf30 | 5q21 | 1.11 | ||
PXK | 3p14 | 1.13 | ||
RBPJ | 4p15 | 1.18 | ||
CCR6 | 6q27 | 1.11 | ||
IRF5 | 7q32 | 1.21 | ||
PADI4 | 1p36 | 1.12 | ||
CDK6 | 7q21 | 1.11 | ||
FCRL3 | 1q22 | 2.15 | SLE, HT, GD | |
CD244 | 1q22 | 1.09 | ||
KLF12 | 13q22 | N/A |