Goal Attainment Scaling in rehabilitation: A literature-based update




Abstract


Goal Attainment Scaling (GAS) is a method for quantifying progress on personal goals. Turner-Stokes’s guide to GAS is a method for quantifying progress towards personal goals. Turner-Stokes’s guide and the use of Kiresuk’s T-score are the most widely used GAS-based approaches in rehabilitation. However, the literature describes a number of other approaches and emphasizes the need for caution when using the T-score. This article presents the literature debates on GAS, variations of GAS (in terms of the score level assigned to the patient’s initial status and description of the scale’s different levels), the precautions to be taken to produce valid GAS scales and the various ways of analyzing GAS results. Our objective is to (i) provide clinical teams with a critical view of GAS (the application of which is not limited to a single research group’s practices) and (ii) present the most useful resources and guidelines on writing GAS scales. According to the literature, it appears to be preferable to set the patient’s initial level to –2 (even when worsening is a possible outcome) and to describe all five GAS levels in detail. The use of medians and rank tests appears to be appropriate, given the ordinal nature of GAS.


Résumé


La Goal Attainment Scaling (GAS) est une méthode qui permet d’écrire des échelles d’évaluation personnalisées. La version de GAS de Turner-Stokes et l’utilisation du T-score de Kiresuk sont les approches les plus utilisées en rééducation, alors que les données de la littérature recommandent des approches différentes et remettent en cause l’utilisation du T-score. Cet article présente ces débats de la littérature, les variantes dans l’écriture des GAS (en termes de chiffre alloué à l’état initial du patient et de description des niveaux de l’échelle), les précautions à prendre pour que les GAS rédigées soient valides et les différentes manières d’analyser les résultats des GAS. L’objectif est de procurer aux équipes une vue critique de la GAS ne se limitant pas à la pratique d’une équipe et de présenter les articles les plus utiles pour se familiariser avec la méthode. Au vu de la littérature, il paraît plus judicieux de fixer le niveau initial d’un patient à –2 (même lorsqu’une aggravation est possible), et de décrire précisément les cinq niveaux de GAS. Analyser les résultats GAS en utilisant des médianes et des tests de rang, permet de respecter le caractère ordinal des GAS.



English version



Introduction


Goal Attainment Scaling (GAS) is a method for writing personalized evaluation scales in order to quantify progress toward defined goals. This approach is attracting growing interest in clinical practice because it enables assessment of a treatment’s efficacy in terms of goals set by the patient him/herself (rather than on generic scales, which may not always include the problem that most severely bothers the patient). GAS is used in many fields, including medicine and especially in psychiatry, geriatrics and physical and rehabilitation medicine (PRM) – fields in which setting precise goals is a fundamental part of treatment planning. In fact, GAS can be used to cover all the fields of the International Classification of Functioning, Disability and Health (ICF) by choosing goals that cover activity, participation, quality of life and environmental factors . Involving the patient and his/her family and carers in the choice of treatment goals may enable better integration of these goals into activities of daily living by transforming goals related to ICF activity domain into participation goals in the patient’s usual context . Patients undergoing rehabilitation are more motivated when their goals are clearly defined and consistent with their life project . Rehabilitation outcomes are better when the patient is involved in setting his/her goals . In PRM departments, GAS helps to:




  • plan rehabilitation programmes by setting priorities;



  • structure team meetings and multidisciplinary consultations around precise objectives;



  • better quantify a patient’s progress;



  • better communicate with the patient, his/her family and rehabilitation funding bodies .



Lastly, GAS can also be used to address ethical issues (resuscitation orders) or to assess health care system functioning .


Furthermore, GAS is being increasingly used as an outcome measure in research on rehabilitation programmes or treatments for disabled people (e.g. prosthetics , occupational therapy , neuro-orthopaedics , paediatric rehabilitation , locomotor rehabilitation and special education ).


Several literature reviews on GAS have been published . The most recent of these (by Vu and Law ) cited 17 articles in the field of rehabilitation. These various reviews have covered the psychometric qualities of GAS but do not provide any concrete guidance on its application in clinical practice or research. Furthermore, they do not show how methodological differences may influence the scientific validity of GAS and they barely address the interpretation of GAS results. Hence, guides on using GAS tend to reflect the practices adopted by a small number of research groups. Indeed, Turner-Stokes’ work has attracted so much attention that some researchers may even gain the impression that it is the only guide to GAS. In fact, Turner-Stokes’s guide does not feature a number of important aspects of GAS published by other groups (particularly those by Tennant and Steenbeek ).


The objective of the present article is to review the literature on GAS in a pragmatic way, so that interested practitioners may use the method in their clinical practice and/or research. In particular, we shall review methodological variations, the latter’s influence on interpretation of the scale’s psychometric qualities, the T-score’s properties and debates prompted by this method.



Writing Goal Attainment Scales


Here, we shall not provide a detail description of the procedure for using GAS in patients in rehabilitation departments described by Turner-Stokes et al. because it has been widely disseminated in recent years (including a French-language version presented at the Ipsen symposium during the French PRM Society’s annual congress ).


Overall, GAS methodology consists in:




  • defining a rehabilitation goal;



  • choosing an observable behaviour that reflects the degree of goal attainment;



  • defining the patient’s initial (i.e. pretreatment) level with respect to the goal;



  • defining five goal attainment levels (ranging from a “no change” to a “much better than expected outcome”);



  • setting a time interval for patient evaluation;



  • evaluating the patient after the defined time interval;



  • calculating the overall attainment score for all the rehabilitation goals.



Optional extensions of this method consist in dividing long-term goals into short-term sub-goals with corresponding GAS sub-scales and giving more weight to some goals than to others.


A five-point scale is generally used: “–2” is the initial pretreatment (baseline) level, “–1” represents progression towards the goal without goal attainment, “0” is the expected level after treatment, (and therefore, the “most likely” level after treatment), “+1” represents a better outcome than expected, and “+2” is the best possible outcome that could have been expected for this goal. Since there may be several rehabilitation goals for a given patient, each goal will have its own GAS scale. Determining the rehabilitation goal is relatively easy in routine PRM practice, inasmuch as GAS is a formalization of the therapeutic objectives discussed on a daily basis with patients and their families. However, it is more difficult to draft a full goal attainment scale, i.e. to precisely describe the five attainment levels. Bovend’Eerdt’s and Steenbeek’s groups have focussed on how to choose the GAS levels.


Bovend’Eerdt’s group developed a method for easily determining the various GAS levels once the main goal has been defined. The first step consists in identifying the patient’s expectations and the environmental factors influencing the performance of the activity in question (e.g. the patient’s house has two floors and thus the patient needs to walk up and down stairs: Table 1 ). The second step consists in determining the observable target behaviour corresponding to the target activity (e.g. walking down 10 steps of the stairs). In the third step, the rehabilitation team works with the patient to identify the assistance required to perform this activity: human assistance, technical aids, assistive devices, verbal guidance, cognitive assistance, etc. The fourth step consists in quantifying the initial performance at the target activity in terms of the time required, quantity (e.g. the number of steps) and frequency (e.g. frequency of falls) of the target behaviour. The five attainment levels are then written by adding or changing the “assistance required” and/or “performance quantification” categories. It is important to modify only one characteristic at a time.



Table 1

Examples of GAS scales written for a child with traumatic brain injury, presenting a dysexecutive syndrome, left-side hemiparesis and impairment of the right arm (significant ulnar deviation and spasticity of the elbow) that complicates eating.


















































Goal attainment scale The child’s main goal: walking around at home more easily, including the staircase
Observable target behaviour: walking down 10 steps of the stairs
Weighting: w = 4
The child’s main goal: eating on his/her own more easily
Observable target behaviour: eating a bowl of mashed potatoes with a spoon, unaided
Weighting: w = 2
The family’s main goal for the child: being able to prepare his/her school bag
Observable target behaviour: preparing the school bag
Weighting: w = 1
–2 Walks down the stairs without alternating steps with one hand on the stair rail and the other held by a carer Starts to eat a bowl of mashed potatoes unaided but cannot finish it The school bag is prepared by the parents or the teacher. The child is unable to prepare the bag alone
–1 Walks down the stairs without alternating steps and holding the stair rail only, while supervised by a carer Manages to eat a bowl of mashed potatoes unaided but takes more than 15 minutes Prepares the school bag but requires constant verbal guidance from the parents or teacher
0 Walks down the stairs with alternating steps, with one hand on the stair rail and the other held by a carer Eats a bowl of mashed potatoes in 11 to 15 minutes The child manages to prepare the school bag thanks to a check-list of the necessary steps and under supervision, so that steps are not forgotten
+ 1 Walks down the stairs with alternating steps and holding the stair rail only, while supervised by a carer Eats a bowl of mashed potatoes in 7 to 11 minutes The child manages to prepare the school bag alone, thanks to a check-list of the necessary steps. No supervision is required and the child only occasionally forgets items
+2 Walks down the stairs unaided, with alternating steps, holding the stair rail and not supervised by a carer Eats at an essentially normal speed, like the family’s other children The child manages to prepare the school bag alone and without a check-list and only occasionally forgets items
Level attained after 4 months After rehabilitation with physiotherapy: walks down the stairs with alternating steps, with one hand on the stair rail and the other held by a carer (Score = 0) After botulinum toxin injection, occupational therapy and use of a splint against ulnar deviation: is able to eat alone and finish a bowl of mashed potatoes but never in less than 15 minutes. (Score = –1) After cognitive rehabilitation with scripts, sequencing exercises and training in following step-wise instructions for tasks: Manages to prepare the school bag alone, thanks to a check-list of the necessary steps (Score = +1)

The weighting corresponds to the child’s priorities (goals 1 and 2) and the parents’ priority (goal 3) at the time of the evaluation; more weight is given to the child’s motor goals than to the parents’ cognitive goal.


This type of formulation appears to be preferable to quantitative evaluation (e.g. use of cognitive compensation less than 10% of the time, 10 to 25% of the time, 26 to 40% of the time and so on ). Firstly, it is rarely possible to measure activity on this type of numerical scale. Secondly, patients are not observable all of the time (even by the main carer). Likewise, the use of visual analogue scales (VASs) for quantifying the difficulty of dressing (for instance) is not recommended, since we are not aware of literature reports in which the correlation between the difficulty of dressing rated on a VAS on one hand and according to GAS on the other has been confirmed.


Steenbeek’s group sought to identify for objective, observable measurements with a view to writing scales that were as accurate as possible. Their GAS cover activities in which performance is measurable and is assumed to reflect attainment of the goal. For instance, the ability to walk on uneven ground is evaluated by a timed walk between the rungs of a ladder that simulates an uneven surface ; the ability to handle a joystick is evaluated in terms of the number of spaces coloured during a given time interval in a computer drawing programme .


GAS must meet a series of criteria that have been defined as research in this field has progressed :




  • each GAS level must be described accurately enough to allow a person not involved in the GAS-writing process to easily classify the patient at one of the GAS levels described therein;



  • each scale must represent a single dimension of change;



  • the levels must be measurable and thus defined in terms of observable behaviours;



  • the scales must correspond to goals that are important to the patient;



  • all the levels must be realistic and attainable. In particular, the +2 level must not correspond to an unexpected or miraculous goal;



  • the time scale within which goals must be attained and scales must be scored should be defined in advance;



  • the interlevel differences in difficulty must all be the same, i.e. it must be as difficult to go from –2 to –1, as from –1 to 0 or from 0 to +1, etc.



These criteria are based broadly on the idea that regardless of the GAS scale, all rehabilitation goals must be “SMART” : specific, measurable, acceptable, realistic and defined in time.


Consequently, the most frequent mistakes in writing goal attainment scales are as follows:




  • attainment levels that overlap or, in contrast, are not covered by any of the goals;



  • unequal gaps between levels (although this problem can never be completely eliminated);



  • the use of multidimensional scales (e.g. standing up and walking);



  • over-simple goals, the attainment of which does not correspond to a significant clinical difference;



  • subjective criteria for goal attainment (i.e. based on opinions and interviews, rather than objective, quantifiable observations).



Some GAS training methods have been proposed ; they show that well-trained rehabilitation staff are able to draft realistic, pertinent GAS for their patients. One of the best ways of writing a goal attainment scale is to use existing scales, such as those published as illustrative examples by experienced research groups (see in particular, for examples of initially erroneous scales that were corrected after training).



Use of Goal Attainment Scaling in research


It has been suggested that goal attainment scales designed for demonstrating the efficacy of a treatment should follow stricter rules, in order to diminish the level of subjectivity. The most reasonable proposals (which are not necessarily applied in the literature) include:




  • the revision of the goals and of the GAS scales by an independent third party ;



  • attainment level scoring by a person who is not part of the team that set the goals at the outset ;



  • the use of “control goals” that are not targeted by rehabilitation ;



  • evaluation of the patients on two different attainment scales developed by independent research groups (i.e. treatment success must be independent of how the goals were formulated) ;



  • goal-setting by a group (rather than a single therapist or the patient alone), in order to avoid overly simple or unrealistic goals .




Expressing Goal Attainment Scaling results


Four different ways of expressing the results can be found in the literature:




  • scoring each goal between –2 and +2, resulting in as many raw scores as there are scales and giving a direct result for each goal, which is easily understood by the patient and easy to use in clinical practice;



  • a T-score , which is supposed to enable GAS scores to be normalized and then analysed with parametric statistics (please refer to the section on this topic below);



  • the mean of the raw scores , giving an overall score between –2 to +2 for the goals as a whole;



  • the sum of the differences between the initial level and the attained level for each of the patient’s goals .



The respective advantages and drawbacks of these four methods are summarized in Table 2 and will be discussed in the last part of the article. It is important to bear in mind that the complex calculation of the T-score is not the only method that can be used with GAS: a simple –2 to +2 scale is sufficient in clinical practice because the aim is to see where the patient is with respect to the agreed goal.



Table 2

Advantages and drawbacks of the different ways of expressing Goal Attainment Scaling results and the corresponding scores for the example given in Table 1 .



































Advantages Drawbacks Expression of the results for the GAS example in Table 1
Raw scores ranging from –2 to +2 for each scale Easily understood by the patient
A direct, rapid result for each goal
Does not affect the ordinal nature of GAS data
Parametric statistics are not applicable.
No overall score for the efficacy of the treatment
GAS #1: score = 0
GAS #2: score = –1
GAS #3: score = +1
Sum of the differences between the initial level and level attained Enables different initial levels (–1 or –2) to be taken into account
Since the score is dependent on the number of goals, it is only meaningful when divided by the number of goals
Not applicable in group studies if the patients in the study group sample have different numbers of goals and thus GAS scores Sum of the differences: 0–(–2) + –1 – (–2) + 1–(–2) = 6

NB. if one divides this score by the number of scales (three, in this example), one can see that the child has improved by two attainment levels, on average
Mean of the raw scores Rapid and easy to perform during a consultation
Easily understandable for the patient
Independent of the number of scales
The performance of arithmetic operations on ordinal data is problematic Mean = [0 + (-1) + 1]/3 = 0

On average, the goals have been attained as expected (mean 0)
T-score Most frequently used in the literature.
Supposedly enables GAS scores to be normalized
The performance of arithmetic operations on ordinal data is problematic T-score = 48.2 with the weightings chosen by the parents
T-score = 50 if the same weight is allocated to all the goals


The T-score is the most frequently used method and expresses all the patient’s scales results as a single standardized value. Although the T-Score and the raw scores are considered to be highly correlated, this statement is based on studies with small sample sizes . It is possible to weight the T-score by giving more weight to certain goals and thus to the scores on the corresponding scales. The T-score is calculated by applying an equation that transforms the raw scores from the individual scales into a single number.


T=50+10WiXi((1ρ)Wi2+ρ(Wi)2)
T = 50 + 10 ∑ WiXi 1 − ρ ∑ Wi 2 + ρ ∑ Wi 2
where: Xi = the GAS score, Wi = the weighting of each goal attainment scale, ρ = the correlation coefficient between the various scales.


If all the goals have the same weight, this equation simplifies to :


T=50+Cxxi
T = 50 + Cx ∑ xi
where C is a coefficient that depends on the patient’s number of scales (and thus the number of scores). C is 10 for one scale, 6.2 for two scales, 4.56 for three scales; 3.63 for four scales and 3.01 for five scales .



Weighting and properties of the T-score


Various weighting methods have been suggested in the literature: weighting as a function of the importance and difficulty of the goal and according to the probability of attaining the goal in question . Although the weighting is supposed to influence the T-score , weighted scores and scores in which all the goals have the same weight are correlated .


If one decides to use the T-score, one must bear in mind that four of its characteristics will influence the final result. Firstly, in the T-score equation, ρ represents the coefficient for the correlation between a patient’s various scales. Kiresuk suggested using a ρ value of 0.3 . In principle, this value produces a standard deviation of 10 for the T-score values . In reality, ρ is often lower than 0.3 because the goals and thus the goal attainment scales can belong to as many fields as the patient wishes and are thus poorly interrelated. In contrast, some GAS from the same field can have higher ρ values. In the field of cognitive rehabilitation for example, where one expects use of a memory book to be correlated with better organizational abilities and perhaps better job-seeking abilities, Malec found a value of ρ = 0.44. Although ρ should, in practice, be adjusted on a case-by-case basis , it hardly changes the value of the T-score ( Table 3 ).



Table 3

Illustration of the slight variations in the T-score caused by changing (i) the weight allocated to each scale and (ii) the correlation coefficient ρ , using the example from Table 1 where GAS #1 score = 0; GAS #2: score = -1; GAS #3: score = +1.








































T-score with ρ = 0.3 T-score with ρ = 0.44
No weighting 50.0 50.0
i.e. with equal weights: GAS #1 w = 1; GAS #2 w = 1;GAS #3 w = 1
With weighting
GAS #1 w = 4; GAS #2 w = 2; GAS #3 w = 1 48.2 48.1
Simulation with other weightings
If GAS #1 w = 40; GAS #2 w = 20 and GAS #3 w = 10 48.2 48.1
If GAS #1 w = 10, GAS #2 w = 9 and GAS #3 w = 2 45.7 45.4


Secondly, Kiresuk and Sherman postulated that the T-scores should be distributed around a value of 50 with a standard deviation of 10; this was confirmed by several studies in the 1970s . Since that time, many authors have considered that use of the T-score is equivalent to normalization – however, the use of T-score does not guarantee that the score data are normally distributed .


Thirdly, the T-score equation is built so that the initial T-score varies according to the number of goals and scales set at the outset – even though the patients all start from the –2 level for all the goals (e.g. an initial T-score of 23 for a patient with three goals and a value of 30 for a patient with just one goal).


Fourthly and lastly, use of different multiples for T-score weighting will not alter the result; for example, weightings of 1, 2 and 3 will result in the same T-score as weightings of 10, 20 and 30) . However, the score does not vary greatly as a function of the weighting ( Table 3 ).


Consequently, these four characteristics imply that:




  • since the true value of ρ is unknown, one can use other values ;



  • use of a T-score does not dispense with the need to check for a normal data distribution before applying parametric statistics;



  • to compare two groups before and after treatment using T-scores, the practitioner must check that the groups have a similar number of goals;



  • since weighting is subjective, it is rarely used in rehabilitation and so the T-score can be calculated using the simplified equation:


T-score=50+Cxxi
T-score = 50 + Cx ∑ xi



Variants of the Goal Attainment Scaling methodology



Who sets the goal and writes the Goal Attainment Scaling scales?


All scenarios have been described in the literature, with scales written by the patient , by the therapist , by the rehabilitation team (with or without the patient’s involvement), by an independent “goal selector” or even an external “goal selection committee” . However, it appears that goals are more likely to be attained if the patient is involved in selecting them .


Setting a goal before the start of treatment, defining the goal precisely and agreeing on the different attainment levels helps to transfer information and to negotiate realistic goals . In Turner-Stokes’ work , patients are encouraged to set their goals themselves. However, to help them do this, Turner-Stokes’s group have developed a “menu” of prestated goals in the most common fields in rehabilitation (walking, pain, dressing, etc.), which can help the patient and the care team to formulate their own objectives.


In anosognosic patients or a patient who are not greatly aware of their difficulties, goal-setting is indeed more complicated but becomes a therapeutic process per se .


In paediatric rehabilitation, it appears to be essential to involve the family in goal-setting because the literature data show that children, parents, therapists and physicians have differing concerns and priorities . Hence, there is a growing body of literature in favour of rehabilitation based on goals set by the family and the child .



What is the initial level? How can worsening be expressed?


In Turner–Stokes’s method, the value corresponding to the starting level is chosen according to whether or not worsening is possible . The patient’s initial status will be –2 if worsening is not possible. If worsening after treatment is plausible, the initial level will be set to –1 so that any worsening can be rated as –2. Although this method has the advantage of enabling aggravation to be scored, it has several disadvantages. By setting the initial level at –1, three different levels of goal attainment are defined but none corresponds to progress without attaining the goal – a situation that is frequently encountered in clinical practice. A patient who has not progressed and a patient who has progressed (but not enough to attain the expected goal at level 0) will both scored as –1, even though their respective clinical responses to treatment differed. Furthermore, progress is measured on three or four levels depending on whether the initial level is set to –1 or –2, respectively; this makes it difficult to compare scales.


A growing number of researchers set the initial level to –2 for all patients, in order to obtain comparable scales. This usefully enables one to measure improvement in the absence of goal attainment, although the floor effect associated with this method makes it impossible to score aggravation. Steenbeek suggested adding a –3 level for expressing aggravation. However, this would prevent the calculation of a T-score with a Gaussian distribution centred on 0. In contrast, this approach may be appropriate if the T-score is not used.



How many levels should be described?


According to Turner-Stokes et al. and other clinicians and researchers, only two levels need to be described in detail: the initial level (the patient’s current status) and the expected level (the goal). The other levels are set afterwards and can be expressed as follows: the goal has been attained, as expected: 0; the patient’s status has not changed: –2; the patient has improved and progressed towards the goal but has not attained it: –1; the patient has marginally surpassed (+1) or greatly surpassed (+2) the expected outcome. The disadvantage of this scoring method is subjectivity, notably when deciding between +1 and +2. Turner-Stokes et al. recommended reserving this method for clinical practice but wrote that all five levels should be precisely defined if GAS is used as an efficacy criterion in research . However, this latter recommendation is not always applied . The advantage of this method relates to the fact that describing two levels is less time-consuming.


Nevertheless, most authors consider that it is essential to describe each of the five levels with a great degree of precision and to define on which task goal attainment will be evaluated and in which context (see Steenbeek et al. for examples of GAS scales with detailed descriptions of these aspects ). The major disadvantage of this method relates to the time needed to accurately describe the five attainment levels: an average of 45 minutes, according to Steenbeek et al. , but just 10 to 12 minutes per scale according to other authors . All the literature data suggest that the time required to define rehabilitation goals decreases as the rehabilitation team gains experience, including situations in which the goals have to be defined with the family .


The “three-milestone GAS” falls between Turner-Stokes’s method and that of Steenbeek. The objective is be less subjective (notably when differentiating between +1 and +2) but ensures that a scale can be written relatively quickly. Although this variant is still a conventional, five-level goal attainment scale, only the –2, 0 and +2 levels are precisely described prior to treatment; these levels then act as “milestones” within a continuum of possible scores from –2 to +2. If the patient’s status corresponds exactly to the description of the –2, 0 or +2 level, scoring is easy. If not, it is easy to score the patient between two “milestones”; for example, if the patient’s status is better the description for 0 but not good enough for +2, it will be scored as +1.



Other personalised goal attainment measures


It is important to differentiate between conventional GAS (which, despite the above-mentioned variations, remains a precise, codified methodology), and other personalized goal attainment measures (notably Cusick’s seven-level GAS , Weigl’s Modified GAS , Treatment Goal Attainment and the Global Clinical Impression ) and approaches that have been referred to as GAS but use neither Kiresuk’s methodology nor a scale from –2 to +2 . To the best of our knowledge, these methods had not been studied in terms of their psychometric qualities. In contrast, the Canadian Occupational Performance Measure (COPM) is a well-studied method that is more structured than GAS. It is based on a semi-structured interview with three sections: self-care (personal care, functional mobility and community management), productivity (paid/unpaid work, household management, play/school) and leisure (quiet recreation, active recreation and socialization), during which the patient identifies his/her problems and sets treatment goals accordingly.



Precautions regarding the psychometric qualities of GAS methods described in the literature


Given that GAS variants do not all use the same initial level or do not describe their levels with the same degree of precision, the scales’ psychometric qualities cannot be compared in a valid way.



Interrater reliability


The interrater reliability (IRR) is described as good in literature reviews but does appear to vary according to:




  • the precision with which the levels are described;



  • the person writing the scale ;



  • the person scoring the scale ;



  • the field in question .



Goal attainment scales written by speech therapists may have greater IRR (κ = 0.92) than those written by physiotherapists (κ = 0.73) . Greater IRR is obtained when when GAS scales are written by the therapist treating the patient than when the GAS is written by an outside therapist writing the scale after examining the child just for an hour . This is even more true for cognitive domains (κ = 0.85 versus 0.63, respectively) than motor domains (κ = 0.76 versus 0.65, respectively) . The IRR is moderate when one rater observes the patient directly and the other views video recordings (κ = 0.61-0.66) . Table 3 summarizes the interrater reliability in various rehabilitation areas when all five GAS levels are described in detail. To the best of our knowledge, the IRR of Turner–Stokes’s method (in which only two levels are described and the others are “deduced”) has never been reported ( Table 4 ).



Table 4

Interrater reliability in various rehabilitation studies.




































Study Field of rehabilitation ICC/kappa
Rockwood Cognitive 0.97
Palisano Paediatric 0.89 (prestudy)
0.75 (study)
Steenbeek Paediatric 0.63
Bovend’Eerdt Neurological 0.48
Rushton Amputees 0.67
Joyce Neurological 0.92–0.94
Steenbeek Paediatric 0.65–0.92

Interpretation of the IRR (expressed as the kappa/intraclass correlation coefficient (ICC): κ: ≥ 0.9: excellent; 0.9–0.71: good; 0.7–0.51: moderate; 0.5–0.31: poor; ≤ 0.3 very poor.



Content validity


Palisano showed that it was feasible to write GAS that had good content validity as long as the team writing the GAS had sufficient experience and had thought the attainment levels through carefully (notably in terms of the need for a single dimension per scale). However, a GAS validity depends on the way it is written and one cannot extrapolate literature data on other scales. The goal attainment scale’s validity will depend on the setter’s objectivity and ability to anticipate the range of possible outcomes based on their knowledge of the pathology, of the patient’s potential and of the available therapeutic resources. Consequently, GAS results may sometimes reflect the setter’s knowledge and ability, rather than the treatment’s true efficacy .



Criterion concurrent validity


The GAS scores are poorly or not at all correlated with standard scales used in routine practice in rehabilitation in the fields of geriatrics , cognition , neurological disease , orthoses and paediatrics (such as the Barthel Index, for instance ).


GAS and the Global Clinical Impression are strongly correlated . In contrast, Cusick et al. found that GAS was poorly correlated with the COPM , which nevertheless also measures the attainment of personalized objectives. The researchers suggested that this was due to the fact that the GAS took account of two parameters that were not addressed by the COPM in their study (specific arm function and behaviour).



Sensitivity to change


GAS scales present excellent sensitivity to change, as has been demonstrated in different populations and contexts . In rehabilitation, GAS is more sensitive to change than the Barthel Index and Functional Independence Measure are . In some studies, GAS is the only method capable of detecting a change after treatment . Standard scales sometimes fail to detect a change when the goal has been attained . The main reason for this is that often the goals and attainment levels do not correspond to any of the standard scales’ items .



Unresolved issues concerning GAS methodology: the problem of considering Goal Attainment Scaling ordinal scales as interval data


One of the criticisms of GAS scores is based on the non-interval nature of data generated . Despite the care taken to ensure that extent of progression from one level to the next is regular, goal attainment scales are ordinal scales , i.e. the distances between levels are not equal. Each level to which a number is assigned (–2; –1, 0, etc.) can be considered as “more than” or “less than” the following level. In fact, each “level” represents a category of possible values; one solely knows that the “–2” level is below the “–1” level, which in turn is below the “0” level. Consequently, arithmetic operations (e.g. the calculation of means and T-scores) are not applicable to this type of data and so non-parametric statistics (such as rank tests) must be used . Debates concerning the respective processing of interval versus ordinal data are not limited to the field of GAS : many commonly used ordinal scales assign numerical values to categories and interpretation of the results can be significantly flawed, due to irregular intervals between categories . Although this issue has been emphasized in literature reviews , the T-score continues to be used in both clinical practice and research.


The problem applies to all ordinal measures to which arithmetic operations are applied but is more significant when calculating GAS T-score, for several reasons. Firstly, several ordinal variables are multiplied together in the Kiresuk equation (score x weighting, etc.) . Secondly, the clinical difference between the different goal attainment levels is not constant. For example, when walking down stairs, is letting go of the carer’s hand as difficult as letting go of the handrail? Thirdly, the scales used for the T-score may not be from the same dimension (i.e. one scale concerns walking and another concerns sleep). When one combines these sources of variability with the T-score’s dependence on the number of goals, the weightings and the choice of ρ (see above), the T-score appears to be a particularly poor metric for assessing outcome.


Recently, use of the Rasch model has improved the psychometric qualities of scales and questionnaires used in rehabilitation . The model linearizes ordinal scales and thus enables the use of parametric statistics. This type of process is only applicable to a well-defined scale and cannot be used with personalized scales. However, Tennant simulated GAS scales in order to test them in the Rasch model. The T-scores of 300 subjects (tested on ten GAS scales and with scores weighted by the goals’ importance and difficulty) were compared with T-scores in which the scores from the same GAS had been linearized for three variables: the raw score on each scale, the goal’s importance and the goal’s difficulty. Tennant found that 14.7% of the T-scores differed by over 10 points when comparing the non-linearized scales (wrongly considered as linear scores for calculation of the T-score, despite being ordinal) and the scales truly linearized with the Rasch model. Importantly, a 10-point difference is clinically significant. Hence, Tennant’s study demonstrated that GAS results are inaccurate when the ordinal nature of the data is not taken into account.


In contrast, Malec showed that the use of parametric vs. non-parametric statistics gave similar results, but his statement is based on a less rigorous methodology: he compared correlation coefficients between GAS T-scores and the scores of other ordinal scales (used in brain injury rehabilitation) obtained by either a parametric method (yielding Pearson’s correlation coefficient) or by a non-parametric method (yielding Spearman’s correlation coefficient). He showed that Pearson’s and Spearman’s correlation coefficients were indeed quite similar. Although Malec’s study is cited as justification of the use of parametric statistics , it does not enable one to affirm that the T-score is a valid measure.


Given that GAS is not linear, what other options are there for analyzing GAS results? Tennant suggested establishing “item banks” of precalibrated goal attainment scales as a single-dimensional measure via Rasch’s Differential Item Functioning , which enables use of the T-score. However, the disadvantage of this method relates to the loss of the truly personalized nature of GAS.


The best solution is may be that initially suggested by MacKay and applied by Steenbeek’s group : the median of the raw scores (–2 to +2) is analyzed with rank tests and non-parametric statistics. The ordinal nature of the scales is therefore taken into account. For example, in a study on botulinum toxin , Steenbeek used Wilcoxon’s two-tailed signed-rank test to compare two groups in terms of the median score before and after treatment.



Conclusion and recommendations


Setting precise goals, describing the patient’s initial status, defining the possible attainment levels and agreeing on how that goal can be attained: these steps themselves constitute a pedagogic process, that enables:




  • to negotiate realistic goals;



  • to discuss what is the most important for the patient and the patient’s family;



  • to obtain a truly informed consent for the rehabilitation plan proposed;



  • to actively involve the patient and his/her family in the rehabilitation project.



In this sense, GAS is above all a tool for dialogue, patient education and formalization of the patient-caregiver contract, rather than just another dataset in our patient’s already voluminous medical records.


Most studies of the psychometric qualities of GAS have analyzed attainment scales within which the five levels are described very precisely (i.e. an approach that differs from that proposed by Turner-Stokes ). Accordingly, we propose that the term GAS should be applied solely to precisely described, five-level scales. The validity and IRR of the Turner-Stokes method must be explored further before it can be applied. It appears more judicious to set the initial level to –2, in order to obtain comparable scales and to detect progression towards a goal in the absence of attainment (despite the issue of the floor effect). In view of the debates on the ordinal character of the goal attainment scales and the erroneous use of arithmetic calculations (such as the T-score) to interpret GAS ordinal data, it appears more sensible to perform non-parametric analyses of the data (using Steenbeek’s method ) and to abandon the T-score. GAS must assess observable behaviours lest the ambiguity of the ratings add to the imprecision of the scales, whose construct and content validity can never be definitively evaluated due to their idiosyncratic nature.


GAS scales used in research may sometimes prompt erroneous conclusions . Literature proposals of additional rules for GAS used in research should be studied in more detail and applied more rigorously. In particular, the use of control GAS (i.e. concerning goals that are not targeted by treatment and which are unlikely to be attained through generalization) can be useful in multiple baseline protocols by monitoring the respective changes over time in GAS scores for control and target goals. Bearing in mind the wide range of IRR values reported in the literature, clinical trials using GAS as an outcome measure should evaluate their own IRR, explain which variant of GAS is being used (i.e. who sets the goals and writes the GAS scale, whether or not the scale is proofread/corrected by a different person, the number of attainment levels described, the choice of the initial level, etc.). Rehabilitation teams wishing to learn more about GAS can follow published training modules and the guides issued by Bovend’Eerdt et al. and King et al. .


Disclosure of interest


The authors declare that they have no conflicts of interest concerning this article.


This work forms part of a broader project that won the SOFMER-Allergan Prize in October 2012.





Version française



Introduction


La Goal Attainment Scaling (GAS) est une méthode qui permet d’écrire des échelles d’évaluation personnalisées , afin de quantifier la réussite aux objectifs fixés. La GAS connaît un intérêt grandissant en pratique clinique car elle permet d’apprécier l’efficacité d’une prise en charge sur les objectifs choisis par le patient et non sur des échelles génériques, qui peuvent parfois omettre le problème qui est le plus important pour le patient. Elle est utilisée dans de nombreux domaines dont en médecine, en particulier en psychiatrie, gériatrie et MPR où fixer des objectifs précis est fondamental pour planifier la prise en charge. Les échelles écrites par méthode GAS permettent de couvrir tous les champs de la Classification internationale du fonctionnement (CIF) en choisissant des objectifs portant aussi bien sur l’activité que la participation, la qualité de vie ou l’environnement du patient . Impliquer le patient et la famille dans le choix des objectifs, permettrait de mieux implanter les objectifs dans la vie quotidienne, en transformant les objectifs concernant le domaine activité de la CIF en objectifs de participation dans le contexte habituel du patient . La motivation des patients en rééducation est accrue si les objectifs en sont clairement définis et s’ils coïncident avec leur projet de vie . Les résultats de la rééducation sont meilleurs si les patients participent au choix des objectifs . Dans les services de médecine physique et de réadaptation, la rédaction de GAS permet de:




  • planifier la rééducation en fixant des priorités ;



  • de structurer les réunions d’équipes (synthèses) et les consultations multidisciplinaires autour d’objectifs précis ;



  • de mieux quantifier les progrès des patients ;



  • de mieux communiquer avec le patient, sa famille mais aussi les organismes finançant la rééducation .



Enfin des GAS peuvent aussi être écrites pour les enjeux éthiques (ex. désir d’être réanimé) ou pour évaluer le fonctionnement d’un service de soins .


La GAS est également de plus en plus utilisée en recherche comme critère de jugement dans les études visant à démontrer l’efficacité d’une rééducation ou d’une thérapeutique pour des personnes en situation de handicap: en appareillage , en ergothérapie , en neuro-orthopédie , en rééducation pédiatrique , en rééducation de l’appareil locomoteur , en éducation spécialisée .


Plusieurs revues de la littérature ont été publiées sur la méthodologie GAS , dont la plus récente écrite par de Vu et Law a inclus 17 articles dans le domaine de la rééducation. Ces revues de la littérature portent sur les qualités psychométriques de la méthodologie GAS mais ne donnent pas de recommandations concrètes pour son utilisation en pratique clinique ou en recherche, ne montrent pas comment les variantes de la méthodologie influent sur sa validité scientifique et insistent peu sur les manières d’interpréter les résultats des échelles GAS. Les guides d’utilisation de la GAS, quant à eux reflètent les pratiques de certaines équipes, en particulier celle de Turner-Stokes , si bien que la GAS est souvent assimilée à cette publication. Pourtant le guide de Turner-Stokes n’intègre pas d’importants aspects de la GAS publiés par d’autres équipes, et en particulier celles de Tennant ou Steenbeek .


Le but de cet article est de présenter les données de la littérature concernant le GAS de façon pragmatique, afin que les équipes qui le souhaitent puissent l’utiliser en clinique et en recherche. En particulier, une analyse des variantes de la méthodologie sera présentée, leur influence sur l’interprétation des qualités psychométriques des GAS, les propriétés du T-score et les débats que cette méthodologie suscite.



Écriture des échelles GAS


La procédure d’utilisation de la GAS pour les patients hospitalisés en service de réadaptation décrite par Turner-Stokes et al. ne sera pas rappelée ici car elle a été largement diffusée ces dernières années, notamment en version française lors des symposium IPSEN des congrès de la SOFMER .


D’une manière générale, la méthodologie GAS consiste à :




  • définir un objectif de rééducation ;



  • choisir un comportement observable témoignant du degré d’atteinte de cet objectif ;



  • définir le niveau initial (avant traitement) du patient vis-à-vis de cet objectif ;



  • définir cinq niveaux d’atteinte de cet objectif, correspondants à une progression de « pas de changement » à « meilleur résultat espéré » ;



  • fixer un délai pour évaluer le patient sur cet objectif ;



  • évaluer le patient après le délai fixé et calculer un score global d’atteinte des objectifs de rééducation.



Les optionnels de la méthodologie consistent à :




  • diviser les objectifs de long terme en sous-objectifs atteignables à court-terme (avec des sous-échelles GAS correspondantes aux sous-objectifs) ;



  • à pondérer les objectifs en accordant plus de poids à certains d’entre eux.



L’échelle est classiquement écrite en cinq points : « –2 » est le niveau initial (avant traitement), « –1 » représente la progression vers l’objectif sans que celui-ci ne soit atteint. « 0 » est le niveau attendu après traitement donc le niveau « le plus probable » après traitement. « +1 » représente un objectif réussi mieux que prévu et « +2 » le meilleur résultat que l’on pouvait espérer par rapport à cet objectif. Une échelle GAS mesure l’atteinte d’un objectif. Comme les objectifs de rééducation peuvent être multiples, autant d’échelles GAS que d’objectifs seront rédigés pour chaque patient. Déterminer les objectifs de rééducation est relativement aisé en pratique courante en rééducation. En ce sens l’écriture de GAS est en fait la modélisation des contrats discutés au quotidien avec les patients et leurs familles lors de l’élaboration d’un projet thérapeutique. Cependant il est plus difficile de rédiger une échelle GAS complète c’est-à-dire de décrire avec précision les cinq niveaux d’atteinte de cet objectif. Les équipes de Bovend’Eerdt et de Steenbeek se sont particulièrement penchées sur la manière de choisir les niveaux des GAS :


L’équipe de Bovend’Eerdt a développé une méthode pour trouver facilement les différents niveaux des échelles GAS une fois que l’objectif principal est défini. La première étape consiste à identifier les attentes du patient ainsi que les facteurs environnementaux influençant la réalisation de l’activité sur laquelle porte l’objectif (ex. domicile du patient étant sur deux niveaux, d’où nécessité de pouvoir descendre les marches, Tableau 1 ). La deuxième étape consiste à déterminer le comportement cible observable correspondant à l’activité cible (ex. descendre dix marches). Dans la troisième étape l’équipe identifie avec le patient les aides nécessaires pour réaliser cette activité : moyens humains, aides techniques, guidance verbale, aide cognitive, compensations (ex. utilisation d’une rampe, aide d’un adulte)… La quatrième étape consiste à quantifier la performance initiale lors de l’activité cible en termes de temps nécessaire, « quantité » (ex. nombre de marches), fréquence (ex. fréquences de chutes) du comportement cible. Les cinq niveaux de l’échelle sont alors rédigés en supprimant ou modifiant les catégories « aides nécessaires » et/ou « quantification de performance ». Il est important de n’en modifier qu’une caractéristique à la fois.



Tableau 1

Exemples de GAS, écrites pour un enfant traumatisé crânien présentant un syndrome dysexécutif, une hémiparésie gauche avec une atteinte du membre supérieur droit (inclinaison ulnaire importante et une spasticité du coude) le gênant pour manger.


















































Niveau GAS Objectif principal de l’enfant : circuler plus facilement dans son domicile, y compris dans les escaliers
Comportement cible observable : descendre 10 marches d’escaliers
Pondération : w = 4
Objectif principal de l’enfant : manger seul plus facilement
Comportement cible observable : manger un bol de purée avec une cuillère sans aide
Pondération : w = 2
Objectif principal de la famille : avoir toutes les affaires nécessaires à l’école sans devoir constamment préparer les affaires pour lui
Comportement cible observable : préparation du cartable
Pondération : w = 1
–2 Descend marche par marche sans alternance, une main sur la rampe, une main tenue par un aidant Commence à manger seul un bol de purée mais n’arrive pas à finir Le cartable est préparé par les parents ou la maîtresse. Il est incapable de le préparer seul
–1 Descend marche par marche sans alternance, en utilisant la rampe seule, sous la supervision d’un aidant Arrive à manger seul et finir un bol de purée mais nécessite plus de 15 minutes Prépare son cartable lui-même mais en étant constamment guidé verbalement par ses parents ou sa maîtresse
0 Descend marche par marche avec alternance, une main sur la rampe, une main tenue par un aidant Mange le bol de purée en 11 à 15 minutes Arrive à préparer son cartable grâce à une check-list des étapes nécessaires et sous surveillance pour ne pas omettre d’étapes
+1 Descend marche par marche avec alternance, en utilisant la rampe seule, sous la supervision d’un aidant Mange le bol de purée en 7 à 11 minutes Arrive à préparer son cartable seul grâce à une check-list des étapes nécessaires. Pas de surveillance nécessaire et il ne manque des affaires que occasionnellement
+2 Descend de façon autonome avec alternance, en utilisant la rampe seule, sans la supervision d’un aidant Mange à un rythme quasi normal, comme les autres enfants de la fratrie Autonome, prépare seul son cartable sans check-list et il n’y manque des affaires que occasionnellement
Niveau atteint au bout de 4 mois Après rééducation en kinésithérapie : Descend marche par marche avec alternance, une main sur la rampe, une main tenue par un aidant (Score = 0) Après injection de toxine botulinique, rééducation en ergothérapie et appareillage par attelle d’alignement ulnaire : Arrive à manger seul et finir un bol de purée mais nécessite plus de 15 minutes (Score = –1) Après rééducation cognitive avec exercices de scripts, séquençage et suivi d’instruction en étapes sur différentes tâches : arrive à préparer son cartable seul grâce à une check-list des étapes nécessaires (Score = +1)

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Apr 23, 2017 | Posted by in PHYSICAL MEDICINE & REHABILITATION | Comments Off on Goal Attainment Scaling in rehabilitation: A literature-based update

Full access? Get Clinical Tree

Get Clinical Tree app for offline access