There are insufficient studies providing Minimal Clinically Important Difference (MCID) for outcomes related to temporomandibular disorders (TMD).
(1) To provide the MCID of outcomes related to TMD using the Global Rating of Change Scale (GRCS) as an anchor. (2) To verify which outcomes can predict a moderate or large response to the treatment.
Secondary analysis of a randomized controlled trial in subjects with TMD.
Sixty-one women with TMD were divided into intervention and control groups. Visual Analogue Scale (VAS), Headache Impact Test (HIT-6), pressure pain thresholds (PPTs) of masticatory muscles, Mandibular Function Impairment Questionnaire (MFIQ), and Craniocervical Flexion Test (CCFT) were collected at baseline and 5-weeks follow-up.
Participants were divided based on their response to the treatment, according to the GRCS. MCID values were provided for subjects that moderately or largely improved to the treatment. MCID was between 0 and 1.90 for orofacial pain, around 2 points for the MFIQ, between 3 and 6.26 points for the HIT-6, around 0.2 kg/cm2 for the PPTs on masticatory muscles, around 2.5 mm for MMO and between 60 and 68 points for CCFT. Orofacial pain and HIT-6 were the most discriminative variables at determining whether patients would largely/moderately improve or would not improve after treatment.
The values of MCID could be used as guidance for both clinical practice and research. Pain intensity and headache impact were the most predictive outcomes for improvement of the general health status of women with TMD.
The MCID for important outcomes related to TMD were provided.
The MCID values can be used in both clinical practice and future research.
Pain and headache impact are the most predictive outcomes for females with TMD.
Future studies should continue applying VAS and HIT-6 .
Clinicians should apply VAS and HIT-6 to verify the effects of their intervention. .
The minimal clinically important difference (MCID) has been defined as “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s health care management” by )1 (page 408, third paragraph). It is useful for guiding treatment decisions according to the magnitude of change found in clinical research, helping to determine the sample size for further studies, as well as emphasizing the primacy of the patient’s perspective and connects that perspective to the clinicians ( ).
Different methods to determine the MCID have been described ( ; ). Distribution-based approaches are based on the statistical characteristics of the sample. They express the observed change in a standardized metric, for example the effect size, the standardized response mean, and the standard error of the measurement, which links the reliability of the measurement instrument to the standard deviation of the population ( ). The disadvantage is that they do not provide a good indication of the importance of the observed change since they do not have the point of view of the patients.
Anchor-based methods consider input from the patients in their assessments of MCID, using the Global Rating of Change Scale (GRCS) as an anchor, for example. In that way, the method takes into consideration how much change on the measurement instrument corresponds to a MCID defined on the anchor ( ). The advantage is that the concept of ‘minimal importance’ as perceived by the patients is explicitly defined and incorporated in these methods. One limitation of anchor-based approaches is that they are less precise, considering there is no information on whether an important change according to an anchor-based method lies within the error of the health status measurement. Furthermore, there is also a recall bias ( ; ; ).
Temporomandibular disorder (TMD) is a collective term for a number of clinical problems involving the masticatory musculature, temporomandibular joints, and associated structures ( ). At least one sign or symptom of TMD is present in 39% of the general population ( ). Furthermore, females commonly present higher risk of developing TMD than males, with a proportion of 2.3:1. ( ). Temporomandibular disorders are considered to be a major public health problem as it is the main source of chronic orofacial pain and the most prevalent category of non-dental chronic pain conditions in the orofacial region ( ). There is abundant evidence that has shown the great impact that orofacial pain and specifically TMD pain has on women’s quality of life ( ; ). These problems interfere with daily activities, diminishing patients’ capacity for work and/or ability to interact with their social environment ( ). In addition, TMD is considered to have a great economic impact due to direct care and has been shown to have similar individual impact and burden as back pain and severe headache ( ).
Researchers are increasingly interested in demonstrating that trials results are not only statistically significant but also clinically meaningful ( ; ; ; ). There are some studies providing MCID values for some of the outcomes commonly used in clinical trials involving patients with TMD ( ; ; ; ; ; ). However, these MCID values have been generated from patients with general chronic pain, low back pain, or neck pain and not specifically from patients with TMD. Thus, the primary objective was to determine the MCIDs of clinical outcomes using the GRCS as an anchor. As a secondary objective, this study aimed to verify if one (or a combination) of the outcomes could predict improvement after treatment.
This is a study providing validity evidence and determining the MCID for outcomes in patients with TMD. Data was further analyzed from a previously published randomized controlled trial (registry RBR-6c7rq4). Details on methods can be found elsewhere ( ).
Sixty-one women with TMD were recruited through announcements in local and social media. Participants were included if they: were female; aged between 18 and 40 years old and had orofacial pain score ≥3 on a ten-point numerical pain rating scale for at least three months. The diagnosis of orofacial myalgia or mixed TMD was given according to the Research Diagnostic Criteria for TMD ( ). Participants self-reported the presence and intensity of neck pain and headache, not necessarily related to TMD. The exclusion criteria have been described previously ( ).
This trial was approved by the ethics committee (CAE: 41837015.4.0000.5504), and all subjects gave their written consent.
The outcomes (except the GRCS) were collected at baseline and at five weeks. Subjects were assigned to either the manual therapy plus exercise or the control group. The examiner was blinded to group allocation ( ).
Visual Analogue Scale (VAS)
The current pain, maximum orofacial pain in the last week, and minimum orofacial pain in the last week were measured using the VAS (0–10 cm). The reliability of VAS has been considered fair to good (Intraclass Correlation Coefficient – ICC of 0.55–0.83) ( ). The MCID for pain has been reported to range from 1.5 to 3.2 points ( ; ; ; ; ). Furthermore, pain reduction of 30% has been considered clinically meaningful for individuals with chronic pain based on a distribution-based method of analysis ( ).
Pressure pain threshold (PPT) of masticatory muscles
Using a digital algometer, PPTs were measured on masseter and temporalis muscles, bilaterally ( ; ). This procedure has shown fair to good reliability in previous studies (ICC between 0.64 and 0.78) ( , ). Minimum detectable change for PPTs was considered to range from 0.45 to 1.13 kg/cm2 for general spots of the body, based on a distribution-based method of analysis ( ).
Headache impact test (HIT-6)
The HIT-6 can be scored from 36 to 78 points ( ). The test-retest reliability is considered good (ICC from 0.76 to 0.80) ( ). Considering the total range of 42 points, an anchor-based analysis was performed using a combination of a general measure of patient-perceived improvement tool and a headache-specific measure as the anchor, the optimal cut-off point for the MCID was set at 8 points in patients with tension type headache ( ).
Mandibular Function Impairment Questionnaire (MFIQ)
The MFIQ has been used to assess the limitations of mandibular function in patients with TMD ( ; ). The Brazilian version of the MFIQ was used in this study considering the final score of 52 instead of 68, excluding items 1, 2, 6 and 7 from the final score ( ). The Smallest Detectable Difference for this questionnaire for the total score of 68 has been established for the original questionnaire to be 8 points, by a distribution-based method ( ). MCID for the Brazilian version has not yet been provided
Maximum mouth opening (MMO)
Three active maximum mouth openings without pain were collected with a digital caliper ( ). The smallest detectable difference for this variable was established to be 6 mm, by a distribution-based approach ( ).
Craniocervical Flexion Test (CCFT)
The clinical CCFT ( ; ) was applied using a visual feedback device (Stabilizer; Chattanooga Group Inc., Chattanooga, TN, USA) to evaluate the performance of the deep flexor muscles of the neck. Participants performed the nodding movement 10 times for 10 s in 5 progressive stages of increasing pressure (from 20 mmHg to 22, 24, 26, 28, and 30 mmHg). They should maintain the contraction without using the superficial muscles ( ).
The performance index ( , ) was calculated considering how many contractions were done by the patient at each level. The repetitions were multiplied by 2, 4, 6, 8, or 10 indicating the five levels of the test. The maximum score of the CCFT was 300.
The reliability of the procedure has been reported to be good/excellent (intra-rater ICC was 0.78 or 0.98, depending on the study) ( ; ) but there was no previous study showing reference or MCID values for this test, using the performance index.
The Global Rating of Change Scale was designed to quantify a patient’s improvement or deterioration over time, either to determine the effect of an intervention or to chart the clinical course of a condition. This scale provides information about a person’s current health status ( ). It has been previously used on anchor-based approaches for establishing a clinically meaningful change in longitudinal studies ( ; ), providing the single best measure of the significance of the change from the individual perspective ( ). Subjects were asked to respond to the GRCS at 5-weeks follow-up evaluation ( ; ). The blind rater presented a scale ranging from −7 to +7 and asked the following: “considering that on the first evaluation you were at ‘zero’, use the scale to indicate if you are at the same point, better or worse than on the first evaluation, considering all the symptoms that you have in the orofacial region”.
The intervention group received 10 sessions of physical therapy for 5 weeks, applied by a physiotherapist and included non-manipulative techniques ( ; ; ; ; ; ) for the upper cervical spine; and neck stabilization exercises with biofeedback ( , ; ; ). The control group did not receive any study-specific intervention or advice for pain education for 5 weeks ( ).
Data processing and statistical analysis
Subjects were classified as having no improvement when their responses to the GRCS were between −7 and 0, moderate improvement between 1 and 3 and large improvement between 4 and 7 after the treatment. Thus, the anchor-method approach was used to estimate the MCID of each of the outcome variables of interest (e.g. VAS, PPTs, HIT-6, CCFT) using the GRCS as the anchor. Changes from baseline to follow up were calculated for each one of the outcomes of interest and used for analysis. Correlations between the GRS and each of the variables of interest were tested in an exploratory fashion. Most of them were higher than 0.3 as recommended by the literature ( ).
All outcomes were analyzed using a Receiver Operating Characteristic analysis ( ). It compares the values of each measure from 2 groups according to the GRCS: patients who showed an improvement after treatment and patients who did not. This method constructs a graph of the measures’ performance in classifying the groups for each possible cut-off point and calculates the sensitivity and specificity for all the cut-offs.
For each outcome, two independent raters chose the selected cut-off in a way that both sensitivity and specificity could be maximized (i.e. high values); and thus, both values should be similar. In addition, the cut-off chosen should also maximize positive and negative likelihood ratios (LR) and the percentage of correctly classified patients. According to , higher positive likelihood ratios and lower negative values are sought to maximize discrimination. Values higher than 2 for positive LR and lower than 0.5 for negative LR are recommended by the literature ( ; ) A third rater was consulted when there were disagreements between the other raters, in order to provide a final decision on the best cut-off value.
A logistic regression model was followed for the secondary aim. First, a single logistic regression was done to examine the association between each independent variable (e.g. VAS, PPT, HIT-6, MFIQ, MMO, and CCFT) and groups involved (patients who improved and who did not as a dependent variable). The independent variables that were significant at p-value ≤ 0 .20 in the univariate analysis were entered into a multiple logistic regression model, using a forward stepwise fashion. This p-value has been suggested by some as a conservative criterion to involve all potential variables that could be significant in a multivariable regression model. More traditional alpha levels can fail in identifying variables that could be important ( ).
After the addition of each one of the independent variables, a Receiver Operating Characteristic curves analysis was done to determine the discriminative ability of each model for distinguishing between subjects who improved and who did not ( ). They were compared and the model with the highest area under the curve was chosen as the best model ( ; ). The interpretation was done according to the recommended guideline ( ). The following guidelines are recommended to interpret the discriminatory performance of an AUC curve: excellent discrimination (AUC = 0.90 to 1.0); good discrimination (AUC = 0.80 to 0.90); fair discrimination (AUC = 0.70 to 0.80); weak discrimination (AUC = 0.60 to 0.70); and discrimination is no better than chance (AUC ≤0.50). The model that achieved statistical significance (p < 0.05), presented a larger area under the curve and included fewer variables as predictive values as possible was considered to be the best predictive model. All data analyses were performed using STATA software and guided by a statistical expert.
At baseline, participants presented maximum pain of 6.1 (±1.9) cm, minimum pain of 1.7 (±1.6) cm, and current pain of 3.5 (±2.7) cm. They scored 20.3 (±9.4) points at MFIQ, 61.2 (±6.8) points at HIT-6 and 51.2 (±43.8) points at the TFCC. The maximum mouth opening was 34.7 (±9.1) mm, masseters PPT was 1.1 (±0.5) kg/cm2 and temporalis PPT was 1.2 (±0.5) kg/cm2. There were not significant differences between the groups for all variables at baseline. According to the GRCS, 30 subjects had no improvement (responses between −7 and 0), 18 subjects had moderate improvement (between 1 and 3) and 13 subjects had large improvement (between 4 and 7) after the treatment ( Table 1 ).
|Outcome||No improvement||Moderately improved||Largely improved|
|n||Mean (SD)||Min||Max||n||Mean (SD)||Min||Max||n||Mean (SD)||Min||Max|
|Max. Pain (cm)||30||−0.2 (2.2)||−5.5||4.6||18||−1.3 (1.8)||−4.8||2.7||13||−3.1 (2.4)||−6.5||0.9|
|Min. Pain (cm)||30||0.3 (1.5)||−3.8||4||18||−0.5 (1.5)||−2.9||3.4||13||−1.7 (1.7)||−4.9||0|
|Curr. Pain (cm)||30||0.1 (2.2)||−5.4||5.4||18||−0.9 (1.8)||−3.9||2.6||13||−3.2 (2.4)||−6.1||0.1|
|MFIQ||30||−0.2 (5.5)||−12.0||15||18||−2.4 (6.1)||−12.0||13||13||−4.5 (7.6)||−15||12|
|HIT-6||30||−1.9 (7.5)||−24.0||11||18||−6.9 (9.2)||−33.0||8||13||−11.0 (6.9)||−21||3|
|PPT Temp (kg/cm 2 )||30||0.2 (0.3)||−0.5||1.4||18||−0.1 (1.3)||−1.3||0.3||13||0.2 (0.2)||0.1||0.6|
|PPT Mass (kg/cm 2 )||30||0.1 (1.3)||−0.5||0.6||18||0.0 (1.3)||−0.9||0.5||13||0.3 (0.2)||−1||0.8|
|MMO (mm)||30||0.9 (6.1)||−10.8||12.7||18||−0.3 (6.3)||−14.0||12.9||13||4.5 (7.3)||−9.7||20.5|
|CCFT||30||28.00 (100.5)||−230.66||288||18||71.74 (98.1)||−134||260||13||52.00 (52.7)||−42||132|
Minimal clinically important differences
MCID for all outcomes of interest (e.g. cut-offs, sensitivity, specificity, percentage of correctly classified, positive and negative likelihood ratios) of subjects who moderately and largely improved after the treatment are shown in Tables 2 and 3 . Cut-offs were higher on the comparison between subjects who did not improve and subjects who largely improved than on the comparison between subjects who did not improve and the ones who improved moderately. However, the CCFT showed the opposite behavior. Visual analog scales and HIT-6 outcomes correctly classified more than 75% of the subjects when subjects who did not improve were compared to the ones who improved largely.
|Outcome||Subjects (n)||Cut-off||Sensitivity||Specificity||CC (%)||+LR||-LR|