Recent trends in clinical research have moved attention toward reporting clinical outcomes and resource consumption associated with various care processes. This change is the result of technological advancement and a national effort to critically assess health care delivery. As orthopedic surgeons traverse an unchartered health care environment, a more complete understanding of how clinical research is conducted using large data sets is necessary. The purpose of this article is to review various advantages and disadvantages of large data sets available for orthopaedic use, examine their ideal use, and report how they are being implemented nationwide.
Key points
- •
Big data, in the health care setting, may be defined as a collection of information extracted from traditional and digital sources used to drive future discoveries and analyses.
- •
Awareness of advantages and limitations of large data registries, such as Medicare claims data, National Surgical Quality Improvement Program, National Inpatient Sample, Kid’s Inpatient Database, and private alternatives, are necessary for orthopedic surgeons to conduct meaningful outcomes research.
- •
Use of the International Society of Arthroplasty Registries’ recommendations on creation of registries will enable orthopedic surgeons to target quality improvement initiatives and better track patient outcomes.
Introduction
In the early twentieth century, Dr Codman became the first advocator for the collection and analysis of patient outcomes. With the aid of medical and technological advancements throughout the century, the collection, analysis, and interpretation of collected data gave rise to an era of Big Data. Simply put, big data is a collection of information extracted from traditional and digital sources used to drive medical advancements. Over the past decade, several factors have converged allowing for the rise of big data and its implementation in clinical research. Advanced electronic devices with data mining capabilities as well as the reduced cost of data storage and analysis has provided clinicians with the capabilities to answer questions beyond the scope of randomized controlled trials and meta-analyses.
Big Data: Big Business
In the last 5 years, big data research applications are quickly turning into big business. International Business Machine, a prominent leader in information technology (IT), recently reported that 90% of the world’s data was collected within the last 2 years, further demonstrating the rapid rate at which information is being collected. Within the next 5 years this IT sector is projected to grow by 400% to $50 billion annually. As a result, many corporations and entrepreneurs are focusing on IT applications in health care. Early leaders in this emerging sector have been large hospital organizations who have used patient data to implement population-based health care initiatives. Through this approach, clinical and economic advantages in the form of improved quality of care, increased efficiency, and reduced resource consumption have been realized. Although big data can be beneficial to patients, health care providers must use this information in an appropriate manner, as improper use may lead to harmful outcomes and clinical practices.
Large national databases have emerged as a viable means of reporting surgical outcomes and resource consumption patterns within orthopedic surgery. In the United States, several large publically governed patient registries exist. The strengths and limitations of each database depend on the purpose and design of the database. Before incorporating data from a specific database, an investigator must have a good understanding of the research question being asked as well as strengths and limitations associated with available databases. The purpose of this article is to review various advantages and disadvantages of large data sets available for orthopedic use, examine their ideal use, and report how they are being implemented nationwide. Improvements that can be made to more efficiently collect relevant data and introducing model orthopedic practices that have embraced the big data big research model are also presented. Throughout this article, the reader should also be mindful that clinicians cannot ignore big data. Furthermore, clinicians must ensure that it is used in an ethical manner to improve patient outcomes.
Introduction
In the early twentieth century, Dr Codman became the first advocator for the collection and analysis of patient outcomes. With the aid of medical and technological advancements throughout the century, the collection, analysis, and interpretation of collected data gave rise to an era of Big Data. Simply put, big data is a collection of information extracted from traditional and digital sources used to drive medical advancements. Over the past decade, several factors have converged allowing for the rise of big data and its implementation in clinical research. Advanced electronic devices with data mining capabilities as well as the reduced cost of data storage and analysis has provided clinicians with the capabilities to answer questions beyond the scope of randomized controlled trials and meta-analyses.
Big Data: Big Business
In the last 5 years, big data research applications are quickly turning into big business. International Business Machine, a prominent leader in information technology (IT), recently reported that 90% of the world’s data was collected within the last 2 years, further demonstrating the rapid rate at which information is being collected. Within the next 5 years this IT sector is projected to grow by 400% to $50 billion annually. As a result, many corporations and entrepreneurs are focusing on IT applications in health care. Early leaders in this emerging sector have been large hospital organizations who have used patient data to implement population-based health care initiatives. Through this approach, clinical and economic advantages in the form of improved quality of care, increased efficiency, and reduced resource consumption have been realized. Although big data can be beneficial to patients, health care providers must use this information in an appropriate manner, as improper use may lead to harmful outcomes and clinical practices.
Large national databases have emerged as a viable means of reporting surgical outcomes and resource consumption patterns within orthopedic surgery. In the United States, several large publically governed patient registries exist. The strengths and limitations of each database depend on the purpose and design of the database. Before incorporating data from a specific database, an investigator must have a good understanding of the research question being asked as well as strengths and limitations associated with available databases. The purpose of this article is to review various advantages and disadvantages of large data sets available for orthopedic use, examine their ideal use, and report how they are being implemented nationwide. Improvements that can be made to more efficiently collect relevant data and introducing model orthopedic practices that have embraced the big data big research model are also presented. Throughout this article, the reader should also be mindful that clinicians cannot ignore big data. Furthermore, clinicians must ensure that it is used in an ethical manner to improve patient outcomes.
Current big data registries
International Databases
The first national orthopedic registry was created in Sweden in 1975 to collect information on total knee arthroplasty (TKA). Since then, all Scandinavian and several English-speaking countries have developed independent total joint registries. The Swedish joint registry assigns each patient a single national health identifier ensuring that a given primary prosthesis implanted at one institution can be connected to subsequent revisions at a different institution. In 2007, the Nordic Arthroplasty Register Association (NARA) was created to enable collaboration among the TKA and total hip arthroplasty (THA) registries of Sweden, Denmark, and Norway. Although the collaboration has allowed for a robust analysis of total joint arthroplasty (TJA) outcomes, NARA demonstrates how intraoperative techniques, such as cement fixation during THA, depend on regional norms rather than evidence-based practices. Additionally, infrequently used implants have been removed from the Swedish market because of insufficient outcomes data. Ultimately, the relatively small sample size and national tendencies in regard to surgical technique and prosthesis selection may limit the comparative capabilities of Scandinavian TJA registries. Although not without limitation, these evidence- and population-based international registries have served as prototypes for several American data sets.
Medicare Claims Data
The United States has assigned specific governmental agencies with the task of collecting orthopedic care administrative data, whereas European registries were developed under the impetus of professional societies. In 1965, the US Congress established Medicare as Title XVII of the Social Security Act; on July 1, 1966 the program was initiated. The Centers for Medicare and Medicaid Services (CMS) is the national insurance program offering health care to 4 groups of US citizens: those 65 years or older, the disabled, those with end-stage renal disease, and those with amyotrophic lateral sclerosis. The Medicare claims data set has recorded administrative claims data, reimbursement, and payment information on more than 45 million beneficiaries; therefore, it is the most robust nationwide database available to clinicians. However, it is primarily limited to elderly and disabled patients and is not representative of the US population.
Additionally, the database has the ability to track beneficiaries through both ambulatory and inpatient settings. This feature enables researchers to assess short- and long-term outcomes and trends in resource utilization. Although the Medicare data set is considered the most comprehensive and robust database available in the United States, it is heavily regulated and is associated with significant up-front costs ranging from $3000 to $20,000 per data file. Furthermore, the data set relies heavily on International Classification of Disease (ICD) codes and is ineffective at assessing nonbillable events. It should also be noted that the Medicare database does not include information on implant type, surgical approach, or laterality. This lack of data is particularly relevant as CMS shifts away from a fee-for-service model to value-based reimbursement models.
National Surgical Quality Improvement Program
The National Veterans Affair Surgical Risk Study (NVASRS) was developed as an outcomes-based database and was a congressional response to the high surgical mortality rate among the Veteran Health Administration. Following the NVASRS success, the program expanded to all 132 Veterans Affairs hospitals and was renamed the National Surgical Quality Improvement Program (NSQIP). Since its inception, NSQIP has been credited with reducing the 30-day postoperative mortality and morbidity by 47% and 43%, respectively. In 1999, NSQIP was piloted at several academic medical centers and has since been implemented at more than 650 medical institutions worldwide. NSQIP is unique in that trained Surgical Clinical Reviewers collect a continuum of data from admission up to 30 days postoperatively. The collected data are based on the review of patient charts, not insurance or administrative data. Additionally, all surgical patients, including those undergoing elective orthopedic procedures, are included in NSQIP. Thus, NSQIP is particularly well designed at assessing postoperative outcomes and resource utilization among various medical centers while having the added capability of evaluating patients’ postoperative course and complications. Notably, large national and statewide databases are an accessible low-cost alternative to the Medicare database.
Discharge Databases
In 1988, the federal government charged the Agency for Healthcare and Research Quality (AHRQ) with collecting in-hospital outcomes and resource utilization through the Healthcare Cost and Utilization Project (HCUP). Deidentified patient information was gathered from participating medical centers and organized into state and national databases. These data sets were further separated into emergency department (National Emergency Department Sample), ambulatory care center (State Ambulatory Surgery and Services Database), or inpatient hospital stays. The most commonly used HCUP databases include the Kid’s Inpatient Database (KID), National Inpatient Sample (NIS), and State Inpatient Database (SID). Although these discharge databases are relatively low cost and accessible, they rely heavily on ICD coding practices to identify patients of interest.
The NIS is the largest all-payer inpatient database and is updated annually. The entries in the NIS include up to 15 ICD diagnostic and procedural codes. The patient samples are reported in a uniform and deidentified manner, making them ideal for retrospective observational cohort studies. Before 2012, the NIS was referred to as the Nationwide Inpatient Sample; however, the sampling methodology was modified in 2012 to reflect a more nationally representative model; hence, the nomenclature changed to highlight this modification. Historically, the NIS has incorporated up to 8 million discharges from approximately 4000 hospitals across the United States. Hospitals are stratified by size, geographic location, and academic affiliation. The NIS incorporates 20% of all US discharges, and a multiplier is used to provide national estimates. Major limitations of the NIS include its inability to report on short- and long-term outpatient outcomes as well as its heavy reliance on proper coding practices. Furthermore, discrepancies between the NIS and the NSQIP revealed that the NIS is not well suited in reporting short-term outcomes related to infection (sepsis, pneumonia, urinary tract infection, surgical site infections) and 30-day patient mortality. Bozic and colleagues noted that patients’ postoperative complications recorded in administrative databases were comparable and in high concordance with patient charts. Although the NSQIP is similar in cost and usability to the NIS, NSQIP incorporates fewer patient samples and is by no means a nationally representative data set.
The KID is very similar to the NIS except in that it focuses on the pediatric population. It was developed in 1997 as a means of providing a national pool that can be used to assess and analyze rare pediatric disorders. The database is updated every 3 years and has not been well adopted within the orthopedic community. Another HCUP database, the SID, is representative of statewide hospital discharges that have been consolidated in a uniform manner, facilitating its comparison and incorporation into the NIS. Although the SID is overseen by the HCUP, differences in data set variables, length of follow-up, and complication data exist and vary by state. California and New York are considered to have larger and more comprehensive databases.
Although well powered, these large national and statewide databases have historically been used to evaluate epidemiologic patterns. More recently, a trend toward assessing postoperative in-hospital outcomes and resource consumption among patient cohorts has emerged. Investigators have used the Charlson Score and Elixhauser Comorbidity Index as a means of reporting cohort comorbidity profiles. Some studies have matched comorbidity profiles among cohorts reducing confounding bias, allowing for a more accurate comparison of postoperative outcomes and resource utilization patterns.
Other Databases
Alternatives to large government-funded and administered databases are those curated by private insurance providers and large medical entities. Within the last few years, insurance claims data from major insurers have been available for purchase. As medical providers and insurers continue to merge, there is considerable potential for medical research. However, these administrative and insurance claims databases gather patient information from many sources increasing the data sets heterogeneity, ultimately making generalizations difficult to substantiate.
Currently, many private databases, including PearlDiver, MarketScan, and Premier, are available for purchase. Although these databases provide access to massive amounts of information through a simplified platform, their higher costs ($5000 to $50,000) serve as a significant barrier. Additionally, the data incorporated into these databases are extracted and compounded from numerous sources (ie, NIS, CMS, insurance providers) reducing the uniformity of variables comprising the database.