Genetics, Gene Expression, and Epigenetics



Genetics, Gene Expression, and Epigenetics


Muhammad Farooq Rai, PhD

Regis J. O’Keefe, MD, PhD

Robert H. Brophy, MD


Dr. Rai or an immediate family member serves as a board member, owner, officer, or committee member of the Orthopaedic Research Society; Dr. O’Keefe or an immediate family member has received royalties from Fate Therapeutics and serves as a board member, owner, officer, or committee member of the American Orthopaedic Association; and Dr. Brophy or an immediate family member serves as a board member, owner, officer, or committee member of the American Academy of Orthopaedic Surgeons, the American Orthopaedic Association, and the American Orthopaedic Society for Sports Medicine.





INTRODUCTION

Molecular genetics has improved the early detection of diseases through the identification of genetic markers. Epigenetics is a subdiscipline of genetics. Epigenetics refers to modifications independent of the nucleotide sequence that influence the folding of double helix DNA and regulate gene expression.

Genetics is now considered not merely a fundamental science but an applied science, applications of which are extending beyond gene mutations. Information on the coding and regulatory sequences facilitates the identification of disease-causing genes and gene variants. Some of these approaches employ gene expression strategies to measure the expression level as well as expression pattern of a transcript, a set of transcripts, or genome-wide transcripts in a disease. Both the magnitude and pattern of transcript expression provide further insight into dynamic differences in the genetic code and help measure even the smallest changes in the transcripts, thus facilitating the detection of transcript alterations associated with complex diseases such as osteoarthritis (OA). However, gene (transcript) expression changes are driven not only by the coding sequence of the genome (which makes up of 2% to 3% of human genome) but also by the large noncoding sequence (previously thought to be junk DNA) as well as epigenetic changes beyond the DNA sequence itself. For example, all cells in the human body contain the same DNA sequence, yet the function of each cell is different as changes in gene expression are mediated by other transcription-level changes commonly known as epigenetic modifications. Alterations in genes, gene expression, and epigenetic modifications may be a window into the early diagnosis of diseases and identification of patients at risk for developing a disease such as knee OA after meniscus or ligament injuries.

Genetic, transcriptomic, and epigenetic analyses generate a large number of big datasets. These datasets require sophisticated computational software and informational pipelines to maximize the extraction of useful information. This has revolutionized the field of computational biology, thus reshaping molecular genetics as a quantitative and computationally intensive science.


GENETICS

Every cell in the human body holds about 2 m of DNA, which contains the genetic information for every eukaryotic organism to sustain life. The entire genetic material in our body is termed as the genome. All nucleated cells contain chromosomes. Humans have altogether 46 chromosomes (23 pairs) stowed in the nucleus that contain DNA, which is composed of nucleic acid sequences that form genes. DNA is made up of a combination of four nucleotides: guanine (G), cytosine (C), adenine (A), and thymine (T) (Figure 1). Genes are a sequence or region of DNA that contains instructions for
making functional molecules called proteins. Study of these genes (including other components such as enhancers and small ribonucleic acids or RNAs) is called genetics, which also encompasses other components such as heredity and genetic variation. Proteins alone or in complexes perform a variety of cellular functions. With the advancement of technology and discovery of new phenomena, the concept of a gene continues to be refined. For instance, regulatory regions of a gene may be far away from the coding regions, and coding regions may consist of several split exons (a segment of a DNA molecule containing information coding for peptides). The domain of genetics is broad and encompasses several conceptual frameworks that include but are not limited to genomics, transcriptomics, proteomics, heredity, evolutionary genetics, and genetic diseases (Figure 2).






FIGURE 1 Illustration showing nucleotide making up DNA, which is stowed in chromosomes within the nucleus of cells that reside in every tissue of the body.


GENOMICS AND DNA SEQUENCING

The genome is the complete set of DNA within a cell and the field of genomics deals with the genes and genetic information encoded in DNA. It also includes recombinant DNA technology, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze the function and structure of the genome. First-generation DNA sequencing is one of the most fundamental technologies developed in the 1970s to study genetics.1 It uncovered the sequence of nucleotides in DNA fragments and has facilitated the study of the molecular sequences associated with multiple human diseases.2 Despite being a groundbreaking technology, it certainly had many limitations and therefore efforts were directed to develop shotgun sequencing,3,4 second-, third-, and next-generation sequencing technologies. These technologies enabled the sequencing of the genomes of many organisms through genome assembly and were used to sequence the complete human genome in 2003. The human genome is made up of 3 billion base pairs of DNA macromolecules of lengths varying from 33 to 250 million base pairs, distributed in the 23 paired chromosomes located in each human cell nucleus. Since its inception, the Human Genome Project has revolutionized the field of molecular medicine.






FIGURE 2 Illustration showing various domains of genetics.


TRANSCRIPTOMICS

Following the sequencing of the human genome, efforts have been focused on the measurement of the expression of RNA transcripts. In an analogy to the genome, the term “transcriptome” was coined. The transcriptome is the sum of all of
messenger RNA molecules in a cell or population of cells and contains the full information about all RNA transcribed by the genome in a specific tissue or cell type, at a certain developmental stage, and under explicit pathophysiological circumstances. For instance, we know that osteocytes are derived from osteoblast lineage cells that become progressively embedded in mineralized bone matrix in the adult skeleton.5 These cells are morphologically, functionally, and genetically unique, due in part to the underlying expression of selective gene subsets that characterize the osteocyte phenotype. Although substantial changes in gene expression take place during osteoblast to osteocyte transition, the majority of the transcriptome remains qualitatively osteoblast-like.

Transcriptomics is the science that studies an organism’s transcriptome and mechanism of expression. Overall, transcriptome analysis not only allows for the comprehension of the human genome at the transcript level but also provides a conceptual framework of gene structure and function, and gene expression regulation. Moreover, it may disclose the key alterations of biological processes triggering human diseases, thus offering novel tools useful not only for the comprehension of their underlying mechanisms but also for their molecular diagnosis and clinical therapy (see Gene Expression).


HEREDITY AND HERITABILITY

Heredity is the study of how traits are passed on from parents to progeny or how cells acquire the genetic information of parent cells. Heredity is a complex process that involves several steps such as segregation, assortment, and dominance. Through heredity, variations between individuals can accumulate and cause species to evolve by natural selection. Inherited traits are controlled by genes and the complete set of genes within an organism’s genome is called its genotype. Genotype is part of the genetic makeup of the cells that determines, along with epigenetics and environmental factors (discussed later in this chapter), the phenotype of an individual. Phenotype refers to observable characteristics or traits, such as form and structure, developmental processes, and numerous biochemical and physiological properties. The distinction between genotype and phenotype is commonly experienced when studying family patterns for different hereditary diseases. As we know, humans and most mammals are diploid (containing two complete sets of chromosomes); thus, there can be more than one allele (one of a number of alternative forms of the same gene or same genetic locus) for any given gene. These alleles can be the same (homozygous) or different (heterozygous), depending on the individual. With a dominant allele, the offspring is guaranteed to inherit the trait in question irrespective of the second allele.

An organism’s genotype can be uncovered by a process called genotyping, which uses a variety of biological methods such as polymerase chain reaction (PCR), DNA fragment analysis, nucleic acid hybridization, restriction fragment length polymorphism, microarrays, and other high-throughput techniques such as RNA sequencing. Phenotypic traits can be monogenic (due to one gene) or polygenic (attributable to two or more genes). Polygenic traits can be measured by heritability estimates.

Genetic heritability is the estimation of the degree of variation in a phenotypic trait in a population that is the result of genetic variation between individuals in that population. This type of heritability is termed as broad-sense heritability and is denoted by H2. In medical context, H2 can be construed as a measure of the proportion of the variance in liability to a disease or trait caused by additive and nonadditive genetic effects.6 H2 ranges from 0 to 1.0, with higher numbers implying that the genetic component of a trait or disease is more dominant than environmental factors.

Recently, a substantial genetic component has been recognized for many common musculoskeletal diseases such as OA of the hip, knee, and hand; rheumatoid arthritis (RA); ankylosing spondylitis; and others. Heritability estimates that determine the genetic contribution have been measured for a number of musculoskeletal diseases, for example, tendinopathy (0.40),7 rotator cuff tear (0.82),8 primary hip (0.60),9 and knee OA (0.42).10


GENE MUTATIONS

A quantitative trait locus (QTL) is a region of DNA associated with a particular phenotypic trait. It varies in degree and can be attributed to polygenic effects, that is, the product of two or more genes, and their environment. QTLs are often found on different chromosomes. The number of QTLs explaining variation in a phenotypic trait indicates the genetic architecture of a trait. It may indicate that a disease is controlled by many genes of small effect, or by a few genes of large effect. QTLs are mapped by identifying which molecular markers correlate with an observed trait. The molecular marker usually consists of a single nucleotide polymorphism (SNP) and refers to a variation in a single nucleotide that occurs at a specific position in the genome. For instance, the C nucleotide may appear in a majority of individuals, but in some individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations—C or A—are called alleles for this position. Such SNP variation is present to a certain degree within a population and results from gene mutations that potentially impact gene function and expression of disease phenotypes.

SNPs underlie differences in our susceptibility to a wide range of diseases. For instance, a single-base mutation in the type II procollagen gene (COL2A1) is associated with susceptibility to OA. Not all SNPs are the same. Some SNPs fall within the coding sequence (a sequence that encodes a protein) of a gene, some in noncoding region (a sequence that does not encode a protein) of a gene, and yet others in the intergenic region (a sequence located between genes). SNPs within a coding sequence do not always result in a change in the amino acid sequence of the protein and can be divided into
two types: synonymous and nonsynonymous. Synonymous SNPs do not affect the protein sequence, whereas nonsynonymous SNPs change the amino acid sequence of a protein. SNPs that are not in protein-coding regions may still affect gene splicing, transcription factor binding, messenger RNA degradation, or the sequence of noncoding RNA. Gene expression affected by this type of SNP is referred to as an eSNP (expression SNP). Besides SNPs, there also exist several other forms of genetic variations such as copy number variations (CNVs). A CNV is a type of structural change (deletions, insertions, duplications, and complex multisite variants) in the genome in which the sections of genome (a DNA segment of 1 kb or large) are repeated in the form of gain or loss of segments of genomic DNA relative to a reference sequence.11 CNVs are as important a component of genomic diversity as SNPs. Although SNPs are variations at the level of single nucleotide and account for less than 1% of human genome, CNVs affect regions >1 kb and cover roughly 35% of genome.12


GENETIC LINKAGE ANALYSIS

Genetic linkage is a powerful tool for successfully detecting chromosomal locations for causative genes in monogenic diseases and can be used to identify patients with familial risk factors. As genetic mutations causing Mendelian disease are rare in the general population, linkage of a chromosomal locus with disease is evident when genetic markers at or near that locus are co-segregated with disease phenotype within families. Although genetic linkage could map monogenic diseases and localize areas of disease risk, its application in polygenic diseases such as OA and osteoporosis has not been as successful,13 partly due to the requirement of a large number of families with many affected generations. With the advent of whole-genome sequencing, linkage analysis is again emerging as an important and powerful analytical tool for the identification of disease-causing genes through SNPs generated by the whole-genome sequencing.14


GENOME-WIDE ASSOCIATION STUDIES

The limitations associated with genetic linkage analysis for complex (polygenic) diseases can be overcome with linkage disequilibrium mapping to perform large-scale genome-wide association studies (GWAS). A major advantage is that this approach does not require families but instead could use unrelated cases and controls.

Plausible associations between common genetic variants (SNPs) and traits (ie, specific disease phenotypes) are screened by GWAS. GWAS offer an unbiased approach to identify new candidate genes for human diseases that can be confirmed by further candidate gene expression and functional studies. A number of large-scale GWAS have identified a number of OA-associated gene variants with generally small effect sizes (Table 1).


UNDERSTANDING THE FUNCTION OF A GENE

Gene targeting provides a valuable resource to alter a specified gene in order to discern its biological function. Remarkable genetic similarity between mice and humans allows for study of the function of single genes in mice. The mouse is a powerful experimental model, not only because many disease phenotypes can be induced, but also because DNA sequences of genes of interest can be introduced into the germline genome randomly or into a specific locus by recombination.15 A common application of gene targeting is to generate gene knockout mice. Gene knockout mice can be produced as an animal model for a number of human diseases, offering a biological context in which pharmacological and gene therapies can be developed and tested.16 The number of genes that have been knocked out in mice is expanding. In 2007, it was estimated that at least 11,000 gene knockouts have been conducted.17 The Knockout Mouse Phenotyping Program, in partnership with International Mouse Phenotyping Consortium, now aims to generate 20,000 “knocked out mice” genes with disrupted genes throughout 90% of the mouse genome. Some relevant terms are defined in Table 2.


TOOLS FOR GENE KNOCKOUT AND GENOME EDITING

A variety of different approaches have been used to inhibit specific gene expression in mammalian systems. A knockout of a gene or an insertion of a gene to the mouse genome may involve many steps such as specific gene ablation by homologous recombination in embryonic stem cells, introducing mutation(s) into a targeted gene, suppression of gene function (using, eg, antisense DNA or RNA that hybridizes to a protein-coding single-stranded mRNA and blocks its translation into protein), and gene editing. A number of techniques can be used, many of which are becoming obsolete with the development of novel and crisp classes of nucleases such as zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) system (Figure 3) in achieving targeted genome modifications.18 An overview of these systems is provided in Table 3.


CLINICAL SIGNIFICANCE OF GENETIC STUDIES IN THE MUSCULOSKELETAL SYSTEM

Genetics and genomics not only provide useful tools for better understanding of the diseases through the identification of measurable gene variants to identify individuals at (higher) risk of developing a disease but also help screen clinically relevant treatment options to prevent, halt, or
reverse disease progression. Genomic studies have provided important insights into the biologic mechanisms underlying differences in susceptibility or resistance of an individual to musculoskeletal diseases such as RA, OA, and systemic lupus erythematosus. It has been estimated that 39% to 78% cases of OA can be attributed to genetics.19 Genetics also plays a crucial role in drug development. For example, pharmacogenomics involves genetic variants that impact expression of genes affecting drug pharmacodynamics such as absorption, distribution, action, metabolism, and toxicity of a drug. The term pharmacogenomics is often used interchangeably with the term pharmacogenetics. Pharmacogenetics refers to inherited genetic differences in drug metabolic pathways. It also pertains to other pharmacological principles such as enzymes, messengers, and receptors, which can affect individual responses to drugs, in terms of both therapeutic benefits and adverse effects.








TABLE 1 Significant SNPs Associated With Hip, Knee, and Hand OA



















































































































































































































































































Gene


SNP


Locus


Joint


OR


P value


Cases


Controls


References


CHADL


rs532464664


22q13.2


Hip


7.71


5×10-18


4,657


207,514


62


CAMK2B


rs3757837


7p13


Hip


1.27


8×10-10


11,277


67,473


63


NACA2


rs17610181


17q23


Hip


1.12


8×10-6


11,277


67,473


63


RPL35AP, LOC102723442


rs6094710


20q13


Hip


1.28


2×10-10


11,277


67,473


63


NPM1P14, LSMEM1


rs5009270


7q31


Hip


1.1


3×10-6


11,277


67,473


63


A2BP1


rs716508


16p13.3


Hand


NR


1.81×10-5


2,277


NR


64


BTNL2, LOC101929163,


rs10947262


Chr. 6


Knee


1.32


6.73×10-8


960


3,396


65


HLA-DQB1, rs7775228


rs7775228


Chr. 6


Knee


1.34


2.43×10-8


960


3,396


65


ALDH1A2


rs3204689


15q22


Hand


1.46


1.1×10-11


623


69,153


66


DUS4L


rs4730250


7q22


Knee


1.15


9.2×10-9


2,371


35,909


67


DOT1L


rs12982744


19p13.3


Hip


NR


1×10-11


3,717


10,013


68


GNL3


rs11177


Chr. 3


Hip


1.09


5.13×10-9


7,473


42,938


69


FTO


rs8044769


Chr. 16


Hip


1.07


3.56×10-06


7,473


42,938


69


PACERR, PLA2G4A


rs4140564


1q25


Knee


1.59


3×10-6


387


255


69


PARD3B


rs1207421


2q33


Knee


1.46


6×10-6


387


255


70


COG5


rs3815148


7q22


Knee/hand


1.14


8×10-8


14,938


39,000


71


TGFA


rs3771501


Chr. 2


Hip/knee/hand


0.94


1.66×10-8


10,083


12,658


72


DYNC1I1


rs1352413


Chr. 7


Knee


1.39


6×10-6


3,898


3,168


73


FRMD4A


rs7079380


Chr. 10


Knee


1.25


6×10-6


3,898


3,168


73


NOD1


rs6963954


Chr. 7


Knee


1.41


5×10-6


3,898


3,168


73


MCF2L


rs11842874


13q34


Knee


1.17


2×10-8


3,177


4,894


74


DOT1L


rs11880992


Chr. 19


Hip


NR


3×10-16


8,649


>57,000


75


TGFA


rs2862851


Chr. 2


Hip


NR


5×10-11


8,649


>57,000


75


RUNX2


rs12206662


Chr. 6


Hip


NR


1×10-9


8,649


>57,000


75


SMAD3


rs12901071


NR


Knee/hip


1.08


3.12×10-10


23,425


236,814


76


GLIS3


rs10116772


Chr. 9


Knee/hip


1.03


4×10-8


5,414


9,939


77


Chr. = chromosome, NR = not reported, OR = odds ratio

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Apr 14, 2020 | Posted by in ORTHOPEDIC | Comments Off on Genetics, Gene Expression, and Epigenetics

Full access? Get Clinical Tree

Get Clinical Tree app for offline access