Abstract
The association of rheumatoid arthritis (RA) with a number of genetic risk loci is well established; however, only part of the risk to develop the disease is based on genetics. Environmental factors significantly contribute to the pathogenesis. A gene–environment interaction for smoking and certain major histocompatibility complex (MHC) class II alleles has been shown to promote anti-citrullinated protein antibody (ACPA)-positive RA; however, the molecular mechanisms of interaction remain unclear. In contrast to the genetic background, epigenetic factors are responsive to external stimuli and can modulate gene expression. Therefore, epigenetic mechanisms may function as intermediaries between genetic risk alleles and environmental factors.
In this review, epigenetic mechanisms are explained and the evidence for epigenetic changes relevant for the pathogenesis of RA and potential therapeutic applications are discussed.
Introduction
Sequencing technology has allowed to perform whole genome sequencing approaches to determine the genetic basis of rheumatoid arthritis (RA). A number of new genes were found to be associated with the disease, in addition to the long-known association with HLA-DR genes. However, considering a concordance rate in monocygotic twins of only 15% it follows that only a part of the disease risk can be attributed to genetic factors . On the other hand, several environmental factors were identified to impact the disease risk. Among those, smoking is of particular interest, as a gene–environment interaction was demonstrated in anti-citrullinated protein antibody (ACPA)-positive patients with RA. Smoking greatly increased the disease risk when the shared epitope of HLA-DRB1 was present . How smoking and shared epitope are molecularly linked is unknown. In contrast to genetic risk alleles, epigenetic marks are sensitive to external factors. Therefore, environmental exposures may potentially impact the function of the immune system by modulating epigenetic modifications. The term “epigenetics” denotes heritable changes in gene function without alterations of the DNA sequence. These include histone modifications such as methylation and acetylation, as well as DNA methylation. The gene expression profile of a cell is critically dependent on DNA and histone modifications. In a broader sense, also noncoding RNA are included in the definition of epigenetics. In this review, epigenetic changes of cells are described and their role in the disease pathogenesis of RA is discussed, with focus on the implications on gene–environment interactions.
Epigenetic control of gene expression
In the past 10 years, epigenetics research revealed exciting new pathways in the regulation of gene expression. The biochemical modification of DNA and histones was found to have an important role in gene expression and human disease .
DNA methylation is often associated with transcriptional repression in CpG-rich genomic regions . Two major mechanisms have been proposed. Firstly, the direct inhibition of transcription factor binding and secondly the recognition of the methylated regions by specific proteins that bind to methylated DNA (MeCP2, MBD1, MBD2, MBD3, MBD4) and inhibit the access of DNA polymerases and transcription factors.
Another field of epigenetics is the study of chromatin biology . Chromatin consists of DNA wrapped around a complex protein network of histone proteins that form the nucleosome. The nucleosome is an octamer consisting of the histones H2A, H2B, H3 and H4 around which DNA is packed in a tight conformation. Posttranslational modification of histones often occurs at specific transcriptionally active or repressed sites in the genome. Histone modifications alter the structural conformation of nucleosomes and allow access of transcriptional activators.
DNA methylation and histone modifications
The addition of a methyl group to the cytosine base pair is known as DNA methylation and is catalysed by DNA methyltransferases. Three DNA methyltransferases, DNMT1, DNMT3A and DNTM3B, have been associated with DNA methylation in humans. DNMT1 is involved in the maintenance of DNA methylation during somatic cell DNA replication. It interacts with the Ubiquitin-like, containing PHD and RING finger domains 1 (UHRF 1) protein and targets hemimethylated 5′CpG 3′ DNA sequences . The DNMT1/UHRF1 complex transfers a methyl group from S-adenosylmethionine to the unmodified cytosine in the opposite DNA strand.
DNMT3A and DNMT3B are called the de novo methyltransferases. During embryonic development, DNA methylation undergoes extensive demethylation . De novo methyltransferases are mainly involved in the restoration of DNA methylation during cell development.
A characteristic of the human genome is the presence of DNA sequences that are CpG rich and remained unmethylated. These are called CpG islands and often associate with promoter and transcriptional starting sites of genes . However, a known feature of DNA methylation is that the CpG islands remain methylated in the inactive X chromosome, imprinted genes and tissue-specific genes . Transposable elements such as LINE-1 have the ability to integrate randomly to the genome and cause genomic instability. DNA methylation is known to silence these elements.
In addition, DNA methylation can actively be reversed by the action of the TET family of DNA dioxygenases that convert 5-methylcytosine (5-mC) to 5-hydroxylmethylcytosine (5-hmC), 5-formylcytosine (5-fC) and 5-carboxylcytosine (5-caC) . There are three TET genes currently known as TET-1, TET-2 and TET-3.
The posttranscriptional modification of histone tails is the other epigenetic mark that can regulate gene expression . There are 16 posttranscriptional modification marks described so far, including methylation, acetylation, phosphorylation and ubiquitylation . H4 acetylation and H3K4me3 are found in active genes. By contrast, H3K9 and H3K27me3 are associated with gene silencing. A variety of specialized proteins are involved in deposition of the chromatin marks such as the histone acetyl transferases (HATs) and the histone methyltransferases that add the acetyl and methyl group to the histone tails.
Histone deacetylases (HDACs) and demethylases constitute another group of chromatin regulator proteins that remove the acetylation and methylation marks. The classical HDAC family consists of HDAC class I, II, while the SIR2 family of NAD-dependent HDACs form the HDAC class III . The HDAC class I comprises HDAC 1, 2 and 3 which mainly target the histone proteins and HDAC 8 that has the structural maintenance of chromosome 3 (SMC) as a specific protein substrate . The HDAC class II includes the HDACs 4, 5, 6, 7, 9, 10 and 11 that target mainly nonhistone substrates such as transcription factors. Lastly, the HDAC class III family consists of different sirtuins members (SIRT1, 2, 3, 4, 5, 6 and 7) with distinct cell functions . Inhibition of the HDACs can cause alterations in gene expression, affect transcription factor activity, dysregulate signalling pathways and inhibit protein degradation. The molecular targets of HDACs play important roles in apoptosis, cell differentiation, cell cycle, inflammation and angiogenesis. Dysregulation of HDACs is known in different cancer types and a variety of HDAC small molecules inhibitors are in phase II and III clinical trials.
An additional group of proteins has the function to read the histone modifications. Members of this group are the bromodomain proteins (BRD2, BRD3, BRD4) and BRDT in germ cells. These proteins have bromodomains, which interact with acetylated histones and influence gene expression, cell cycle regulation and development .
Currently, next-generation sequencing technologies coupled with chromatin and methylation immunopreciptation assays can determine which histone modification or chromatin modifiers are found in a specific genomic region. These methods have been also used to identify alterations in different diseases. Tumour suppressor genes have been shown to be silenced by promoter methylation or to have hyperacetylated promoters .
Noncoding RNA
Recent years have brought major progress in understanding the expression and function of noncoding RNA. Since the early 1990s when the H19 and Xist long noncoding RNAs were discovered and even more since the year 2000 when let-7 was discovered as one of the first microRNAs (miRNAs), more and more types of regulatory noncoding RNAs have been described and their substantial role in gene regulation elucidated .
The variety and complexity of RNA molecules and functions suggest that we are still at the very beginning of understanding the interconnected network by which noncoding RNA regulates gene expression.
Long noncoding RNA
By definition, every noncoding RNA that is longer than 200 bp is a long noncoding (lnc) RNA. Naturally, this rather crude definition includes noncoding RNAs with totally different structures and functions. The first discovered lncRNA was H19, which is interesting in many aspects. The H19 lncRNA, which was described to be overexpressed in synovial tissues of patients with RA compared to osteoarthritis (OA) or healthy individuals, belongs to the group of imprinted genes . These genes are exclusively expressed from one allele only, in the case of H19 from the maternal, while the other allele is silenced by DNA methylation. As the monoallelic expression of imprinted genes is strongly dependent on regular DNA methylation, it is easily disturbed by disease-related hyper-and hypomethylation, respectively. The function of H19 is not fully clarified up to now. It is important in embryonic growth and development and has been described to work as tumour suppressor as well as to have oncogenic properties . On the molecular level, H19 was found to associate with the histone methyltransferase EZH2, similar to many of the currently described lncRNAs, and therefore might have a role in changing histone methylation and subsequently gene transcription . In addition, the first exon of H19 harbours the sequence of miR-675. To what extent the different functions of H19 are conferred by miR-675 still has to be analysed.
microRNA
From the group of small noncoding RNAs, miRNAs are by far the most studied by now. Mature miRNAs are ∼22 nucleotides (nt) long and are processed from longer precursor transcripts. Most miRNAs are encoded in intergenic regions of the genome. However, they can also lie within introns or even exons of protein coding genes. Quite often, host genes of miRNAs are lncRNAs, as already mentioned for miR-675 which is encoded in H19. The primary microRNA transcript (pri-miR) is around 70 nt long. Often miRNA genes are clustered and are transcribed as polycistrons that can reach a length of several hundred nucleotides. The best characterized of these clusters is the miR-17/92 cluster which is around 800 nt long and comprises six miRNAs, miR-17, miR-18a, miR-19a, miR-20a, miR-19b-1 and miR-92a-1, which are transcribed together.
The primary miRNA transcript is further processed by the endonuclease drosha to form a hairpin structured miRNA precursor, called pre-miR. Pre-miRs are exported to the cytoplasma, where they are rapidly cleaved by the endonuclease dicer. By cleavage of the loop of the hairpin, dicer creates a short, intermediate double-stranded miRNA product consisting of a 5′ (5p) strand and a 3′ (3p) strand. Both miR strands can be functional, even though mostly one of the strands is more stable and the only one incorporated in the RNA-induced silencing complex (RISC).
The miRNA guides RISC to its target mRNA, which generally results in reduced protein translation via various mechanisms. The short length of miRNAs in combination with the fact that only partial complementarity between miRNA and target mRNA is necessary for regulation results in hundreds of potential target mRNAs for every single miRNA. In the rare case, that mRNA and miRNA sequences match perfectly, cleavage of the mRNA is induced. In the more common case of imperfect sequence matching, destabilization of the target mRNA or blockade of translation reduces the protein levels of the target mRNA. Interestingly, lncRNAs, in particular pseudogenes, with sequences that match to miRNAs have been found to work as decoys for miRNAs . These competing endogenous RNAs (ceRNAs) draw miRNAs from their mRNA targets and thereby regulate miRNA function.
More than two-thirds of protein coding genes are believed to be regulated by miRNAs, and miRNAs are involved in the regulation of practically all cellular pathways reaching from proliferation to apoptosis, and from cell differentiation to inflammatory response. It is not surprising that in various diseases, altered expression of miRNAs was detected and miRNAs have been suggested as therapeutic targets. Targeting one miRNA could regulate a whole network of proteins that are dysregulated in disease. The first miRNA that is targeted in disease is miR-122, which was shown to be critical for the replication of hepatitis C virus (HCV) in the liver . Miravirsen is an intravenously applied miR-122 inhibitor that is currently in phase II for treatment of patients with HCV infections .
Remarkably, miRNAs are also found circulating in the blood stream, where they are protected from degradation by microparticles or RNA-binding proteins. Several studies could show strong correlations between miRNA serum levels and disease severity, outcome, or subtypes, in particular in cancer and inflammatory diseases. However, before an miRNA biomarker can be used in routine tests, problems regarding reproducibility, accuracy and simplicity of measuring miRNAs in serum need to be addressed.
An overview of miRNA biogenesis and epigenetic modifications of DNA and histones is given in Fig. 1 .
Epigenetic control of gene expression
In the past 10 years, epigenetics research revealed exciting new pathways in the regulation of gene expression. The biochemical modification of DNA and histones was found to have an important role in gene expression and human disease .
DNA methylation is often associated with transcriptional repression in CpG-rich genomic regions . Two major mechanisms have been proposed. Firstly, the direct inhibition of transcription factor binding and secondly the recognition of the methylated regions by specific proteins that bind to methylated DNA (MeCP2, MBD1, MBD2, MBD3, MBD4) and inhibit the access of DNA polymerases and transcription factors.
Another field of epigenetics is the study of chromatin biology . Chromatin consists of DNA wrapped around a complex protein network of histone proteins that form the nucleosome. The nucleosome is an octamer consisting of the histones H2A, H2B, H3 and H4 around which DNA is packed in a tight conformation. Posttranslational modification of histones often occurs at specific transcriptionally active or repressed sites in the genome. Histone modifications alter the structural conformation of nucleosomes and allow access of transcriptional activators.
DNA methylation and histone modifications
The addition of a methyl group to the cytosine base pair is known as DNA methylation and is catalysed by DNA methyltransferases. Three DNA methyltransferases, DNMT1, DNMT3A and DNTM3B, have been associated with DNA methylation in humans. DNMT1 is involved in the maintenance of DNA methylation during somatic cell DNA replication. It interacts with the Ubiquitin-like, containing PHD and RING finger domains 1 (UHRF 1) protein and targets hemimethylated 5′CpG 3′ DNA sequences . The DNMT1/UHRF1 complex transfers a methyl group from S-adenosylmethionine to the unmodified cytosine in the opposite DNA strand.
DNMT3A and DNMT3B are called the de novo methyltransferases. During embryonic development, DNA methylation undergoes extensive demethylation . De novo methyltransferases are mainly involved in the restoration of DNA methylation during cell development.
A characteristic of the human genome is the presence of DNA sequences that are CpG rich and remained unmethylated. These are called CpG islands and often associate with promoter and transcriptional starting sites of genes . However, a known feature of DNA methylation is that the CpG islands remain methylated in the inactive X chromosome, imprinted genes and tissue-specific genes . Transposable elements such as LINE-1 have the ability to integrate randomly to the genome and cause genomic instability. DNA methylation is known to silence these elements.
In addition, DNA methylation can actively be reversed by the action of the TET family of DNA dioxygenases that convert 5-methylcytosine (5-mC) to 5-hydroxylmethylcytosine (5-hmC), 5-formylcytosine (5-fC) and 5-carboxylcytosine (5-caC) . There are three TET genes currently known as TET-1, TET-2 and TET-3.
The posttranscriptional modification of histone tails is the other epigenetic mark that can regulate gene expression . There are 16 posttranscriptional modification marks described so far, including methylation, acetylation, phosphorylation and ubiquitylation . H4 acetylation and H3K4me3 are found in active genes. By contrast, H3K9 and H3K27me3 are associated with gene silencing. A variety of specialized proteins are involved in deposition of the chromatin marks such as the histone acetyl transferases (HATs) and the histone methyltransferases that add the acetyl and methyl group to the histone tails.
Histone deacetylases (HDACs) and demethylases constitute another group of chromatin regulator proteins that remove the acetylation and methylation marks. The classical HDAC family consists of HDAC class I, II, while the SIR2 family of NAD-dependent HDACs form the HDAC class III . The HDAC class I comprises HDAC 1, 2 and 3 which mainly target the histone proteins and HDAC 8 that has the structural maintenance of chromosome 3 (SMC) as a specific protein substrate . The HDAC class II includes the HDACs 4, 5, 6, 7, 9, 10 and 11 that target mainly nonhistone substrates such as transcription factors. Lastly, the HDAC class III family consists of different sirtuins members (SIRT1, 2, 3, 4, 5, 6 and 7) with distinct cell functions . Inhibition of the HDACs can cause alterations in gene expression, affect transcription factor activity, dysregulate signalling pathways and inhibit protein degradation. The molecular targets of HDACs play important roles in apoptosis, cell differentiation, cell cycle, inflammation and angiogenesis. Dysregulation of HDACs is known in different cancer types and a variety of HDAC small molecules inhibitors are in phase II and III clinical trials.
An additional group of proteins has the function to read the histone modifications. Members of this group are the bromodomain proteins (BRD2, BRD3, BRD4) and BRDT in germ cells. These proteins have bromodomains, which interact with acetylated histones and influence gene expression, cell cycle regulation and development .
Currently, next-generation sequencing technologies coupled with chromatin and methylation immunopreciptation assays can determine which histone modification or chromatin modifiers are found in a specific genomic region. These methods have been also used to identify alterations in different diseases. Tumour suppressor genes have been shown to be silenced by promoter methylation or to have hyperacetylated promoters .
Noncoding RNA
Recent years have brought major progress in understanding the expression and function of noncoding RNA. Since the early 1990s when the H19 and Xist long noncoding RNAs were discovered and even more since the year 2000 when let-7 was discovered as one of the first microRNAs (miRNAs), more and more types of regulatory noncoding RNAs have been described and their substantial role in gene regulation elucidated .
The variety and complexity of RNA molecules and functions suggest that we are still at the very beginning of understanding the interconnected network by which noncoding RNA regulates gene expression.
Long noncoding RNA
By definition, every noncoding RNA that is longer than 200 bp is a long noncoding (lnc) RNA. Naturally, this rather crude definition includes noncoding RNAs with totally different structures and functions. The first discovered lncRNA was H19, which is interesting in many aspects. The H19 lncRNA, which was described to be overexpressed in synovial tissues of patients with RA compared to osteoarthritis (OA) or healthy individuals, belongs to the group of imprinted genes . These genes are exclusively expressed from one allele only, in the case of H19 from the maternal, while the other allele is silenced by DNA methylation. As the monoallelic expression of imprinted genes is strongly dependent on regular DNA methylation, it is easily disturbed by disease-related hyper-and hypomethylation, respectively. The function of H19 is not fully clarified up to now. It is important in embryonic growth and development and has been described to work as tumour suppressor as well as to have oncogenic properties . On the molecular level, H19 was found to associate with the histone methyltransferase EZH2, similar to many of the currently described lncRNAs, and therefore might have a role in changing histone methylation and subsequently gene transcription . In addition, the first exon of H19 harbours the sequence of miR-675. To what extent the different functions of H19 are conferred by miR-675 still has to be analysed.
microRNA
From the group of small noncoding RNAs, miRNAs are by far the most studied by now. Mature miRNAs are ∼22 nucleotides (nt) long and are processed from longer precursor transcripts. Most miRNAs are encoded in intergenic regions of the genome. However, they can also lie within introns or even exons of protein coding genes. Quite often, host genes of miRNAs are lncRNAs, as already mentioned for miR-675 which is encoded in H19. The primary microRNA transcript (pri-miR) is around 70 nt long. Often miRNA genes are clustered and are transcribed as polycistrons that can reach a length of several hundred nucleotides. The best characterized of these clusters is the miR-17/92 cluster which is around 800 nt long and comprises six miRNAs, miR-17, miR-18a, miR-19a, miR-20a, miR-19b-1 and miR-92a-1, which are transcribed together.
The primary miRNA transcript is further processed by the endonuclease drosha to form a hairpin structured miRNA precursor, called pre-miR. Pre-miRs are exported to the cytoplasma, where they are rapidly cleaved by the endonuclease dicer. By cleavage of the loop of the hairpin, dicer creates a short, intermediate double-stranded miRNA product consisting of a 5′ (5p) strand and a 3′ (3p) strand. Both miR strands can be functional, even though mostly one of the strands is more stable and the only one incorporated in the RNA-induced silencing complex (RISC).
The miRNA guides RISC to its target mRNA, which generally results in reduced protein translation via various mechanisms. The short length of miRNAs in combination with the fact that only partial complementarity between miRNA and target mRNA is necessary for regulation results in hundreds of potential target mRNAs for every single miRNA. In the rare case, that mRNA and miRNA sequences match perfectly, cleavage of the mRNA is induced. In the more common case of imperfect sequence matching, destabilization of the target mRNA or blockade of translation reduces the protein levels of the target mRNA. Interestingly, lncRNAs, in particular pseudogenes, with sequences that match to miRNAs have been found to work as decoys for miRNAs . These competing endogenous RNAs (ceRNAs) draw miRNAs from their mRNA targets and thereby regulate miRNA function.
More than two-thirds of protein coding genes are believed to be regulated by miRNAs, and miRNAs are involved in the regulation of practically all cellular pathways reaching from proliferation to apoptosis, and from cell differentiation to inflammatory response. It is not surprising that in various diseases, altered expression of miRNAs was detected and miRNAs have been suggested as therapeutic targets. Targeting one miRNA could regulate a whole network of proteins that are dysregulated in disease. The first miRNA that is targeted in disease is miR-122, which was shown to be critical for the replication of hepatitis C virus (HCV) in the liver . Miravirsen is an intravenously applied miR-122 inhibitor that is currently in phase II for treatment of patients with HCV infections .
Remarkably, miRNAs are also found circulating in the blood stream, where they are protected from degradation by microparticles or RNA-binding proteins. Several studies could show strong correlations between miRNA serum levels and disease severity, outcome, or subtypes, in particular in cancer and inflammatory diseases. However, before an miRNA biomarker can be used in routine tests, problems regarding reproducibility, accuracy and simplicity of measuring miRNAs in serum need to be addressed.
An overview of miRNA biogenesis and epigenetic modifications of DNA and histones is given in Fig. 1 .