“Rule #1: When you get a bizarre finding, first question the test.” †
Weinstein, S. 50 Years of somatosensory research. J Hand Ther . 1993;6(1):113.
Standardized functional tests are statistically proven to measure accurately and appropriately when proper equipment and procedures are used.
In order for a test to be an acceptable measurement instrument, it must include all of the following elements: reliability, validity, statement of purpose, equipment criteria and administration, and scoring and interpretation instructions.
Hand and upper extremity assessment tools fall at varying levels along the reliability and validity continuum and therefore must be selected based on satisfying as many of the required elements as possible.
Objective measurements of function provide a foundation for hand rehabilitation efforts by delineating baseline pathology against which patient progress and treatment methods may be assessed. A thorough and unbiased assessment procedure furnishes information that helps match patients to interventions, predicts rehabilitation potential, provides data from which subsequent measurements may be compared, and allows medical specialists to plan and evaluate treatment programs and techniques. Conclusions gained from functional evaluation procedures guide treatment priorities, motivate staff and patients, and define functional capacity at the termination of treatment. Assessment of function through analysis and integration of data also serves as the vehicle for professional communication, eventually influencing the body of knowledge of the profession.
The quality of assessment information depends on the accuracy, authority, objectivity, sophistication, predictability, sensitivity, and selectivity of the tools used to gather data. It is of utmost importance to choose functional assessment instruments wisely. Dependable, precise tools allow clinicians to reach conclusions that are minimally skewed by extraneous factors or biases, thus diminishing the chances of subjective error and facilitating more accurate understanding. Functional instruments that measure diffusely produce nonspecific results. Conversely, instruments with proven accuracy of measurement yield precise and selective data.
Communication is the underlying rationale for requiring good assessment procedures. The acquisition and transmission of knowledge, both of which are fundamental to patient treatment and professional growth, are enhanced through development and use of a common professional language based on strict criteria for functional assessment instrument selection. The use of “home-brewed,” functional evaluation tools that are inaccurate or not validated is never appropriate since their baseless data may misdirect or delay therapy intervention. The purpose of this chapter is twofold: (1) to define functional measurement terminology and criteria and (2) to review current upper extremity functional assessment instruments in relation to accepted measurement criteria. It is not within the scope of this chapter to recommend specific test instruments. Instead, readers are encouraged to evaluate the instruments used in their practices according to accepted instrument selection criteria, keeping those that best meet the criteria and discarding those that do not.
Standardized functional tests, the most sophisticated of assessment tools, are statistically proven to measure accurately and appropriately when proper equipment and procedures are used. The few truly standardized tests available in hand/upper extremity rehabilitation are limited to instruments that evaluate hand coordination, dexterity, and work tolerance. Unfortunately, not all functional tests meet all of the requirements of standardization.
For a test to be an acceptable measurement instrument, it must include all of the following crucial, non-negotiable elements:
Reliability defines the accuracy or repeatability of a functional test. In other words, does the test measure consistently between like instruments; within and between trials; and within and between examiners? Statistical proof of reliability is defined through correlation coefficients. Describing the “parallelness” between two sets of data, correlation coefficients may range from +1.0 to −1.0. Devices that follow National Institute of Standards and Technology (NIST) standards, for example, a dynamometer, usually have higher reliability correlation coefficients than do tests for which there are no governing standards. When prevailing standards such as those from NIST exist for a test, use of human performance to establish reliability is unacceptable. For example, you would not check the accuracy of your watch by timing how long it takes five people to run a mile and then computing the average of their times. Yet, in the rehabilitation arena, this is essentially how reliability of many test instruments has been documented.
Once a test’s instrument reliability is established, inter-rater and intrarater reliability are the next steps that must be confirmed. Although instrument reliability is a non-negotiable prerequisite to defining rater reliability, researchers and commercial developers often ignore this critical step, opting instead to move straight to establishing rater reliability with its less stringent, human performance-based paradigms. The fallacy of this fatal error seems obvious, but if a test instrument measures consistently in its inaccuracy, it can produce misleadingly high rater reliability scores that are completely meaningless. For example, if four researchers independently measure the length of the same table using the same grossly inaccurate yardstick, their resultant scores will have high inter-rater and intrarater reliability so long as the yardstick consistently maintains its inherent inaccuracies and does not change ( Fig. 12-1 ). Unfortunately, this scenario has occurred repeatedly with clinical and research assessment tools, involving mechanical devices and paper-and-pencil tests alike.
Validity defines a test’s ability to measure the thing it was designed to measure. Proof of test validity is described through correlation coefficients ranging from +1.0 to −1.0. Reliability is a prerequisite to validity. It makes no sense to have a test that measures authentically (valid) but inaccurately (unreliable). Validity correlation coefficients usually are not as high as are reliability correlation coefficients. Like reliability, validity is established through comparison to a standard that possesses similar properties. When no standard exists, and the test measures something new and unique, the test may be said to have “face validity.” An example of an instrument that has face validity is the volumeter that is based on Archimedes’ principle of water displacement. It is important to remember that volumeters must first be reliable before they may be considered to have face validity. A new functional test may be compared with another similar functional test whose validity was previously established. However, establishing validity through comparison of two new, unknown, tests produces fatally flawed results. In other words, “Two times zero is still zero.” Unfortunately, it is not unusual to find this type of error in functional tests employed in the rehabilitation arena.
Statement of purpose defines the conceptual rationale, principles, and intended use of a test. Occasionally test limitations are also included in a purpose statement. Purpose statements may range from one or two sentences to multiple paragraphs in length depending on the complexity of a test.
Equipment criteria are essential to the reliability and validity of a functional assessment test instrument. Unless absolutely identical in every way, the paraphernalia constituting a standardized test must not be substituted for or altered, no matter how similar the substituted pieces may be. Reliability and validity of a test are determined using explicit equipment. When equipment original to the test is changed, the test’s reliability and validity are rendered meaningless and must be reestablished all over again. An example, if the wooden checkers in the Jebsen Taylor Hand Function Test are replaced with plastic checkers, the test is invalidated.
Administration, scoring, and interpretation instructions provide procedural rules and guidelines to ensure that testing processes are exactingly conducted and that grading methods are fair and accurate. The manner in which functional assessment tools are employed is crucial to accurate and honest assessment outcomes. Test procedure and sequence must not vary from that described in the administration instructions. Deviations in recommended equipment procedure or sequence invalidate test results. A cardinal rule is that assessment instruments must not be used as therapy practice tools for patients. Information obtained from tools that have been used in patient training is radically skewed, rendering it invalid and meaningless. Patient fatigue, physiologic adaptation, test difficulty, and length of test time may also influence results. Clinically this means that sensory testing is done before assessing grip or pinch; rest periods are provided appropriately; and if possible, more difficult procedures are not scheduled early in testing sessions. Good assessment technique should reflect both test protocol and instrumentation requirements. Additionally, directions for test interpretation are essential. Functional tests have specific application boundaries. Straying beyond these clearly defined limits leads to exaggeration or minimization of inherent capacities of tests, generating misguided expectations for staff and patients alike. For example, goniometric measurements pertain to joint angles and arcs of motion. They, however, are not measures of joint flexibility or strength.
Although, not a primary instrumentation requisite, a bibliography of associated literature is often included in standardized test manuals. These references contribute to clinicians’ better appreciation and understanding of test development, purpose, and usage.
Once all of the above criteria are met, data collection may be initiated to further substantiate a test’s application and usefulness.
Normative data are drawn from large population samples that are divided, with statistically suitable numbers of subjects in each category, according to appropriate variables such as hand dominance, age, sex, occupation, and so on. Many currently available tests have associated so-called normative data, but they lack some or, more often, all of the primary instrumentation requisites, including reliability; validity; purpose statement; equipment criteria; and administration, scoring, and interpretation instructions. Regardless of how extensive a test’s associated normative information may be, if the test does not meet the primary instrumentation requisites, it is useless as a measurement instrument.
Assuming a test meets the primary instrumentation requisites, other statistical measures may be applied to the data gleaned from using the test. The optional measures of sensitivity and specificity assist clinicians in deciding whether evidence for applying the test is appropriate for an individual patient’s diagnosis.
Sensitivity , a statistical measure, defines the proportion of correctly identified positive responses in a subject population. In other words, sensitivity tells, in terms of percentages, how good a test is at properly identifying those who actually have the diagnosis ( Fig. 12-2 ). The mnemonic “SnNout” is often associated with S e n sitivity, indicating that a N egative test result rules out the diagnosis. As an example, out of a population of 50, if a test correctly identifies 15 of the 20 people who have the diagnosis and who test positive (TP), the sensitivity of the instrument is 75.0% (TP/[TP + FN] = sensitivity %). *
* TP, number of true positives; FN, number of false negatives.
Specificity , also a statistical measure, defines the proportion of negative results that are correctly identified in a subject population. Specificity, in the form of a percentage, tells how good a test is at correctly identifying those who do not have the diagnosis (see Fig. 12-2 ). “SpPin,” the mnemonic associated with Sp ecificity, designates that a P ositive test result rules in the diagnosis. If out of the same example population of 50 subjects, the test correctly identifies 12 of 30 people who do not have the diagnosis who test negative (TN), the specificity of the test is 40.0% (TN/[TN + FP] = specificity %). †
TN, number of true negatives; FP, number of false positives.
Predictive values define the chances of whether positive or negative test results will be correct. A positive predictive value looks at the true positives and false positives a test generates and defines the odds of a positive test being correct in terms of a percentage. Using the examples previously mentioned, 18 of 33 positive test results were true positives (pink), resulting in a positive predictive value of 54.5%. Conversely, a negative predictive value specifies, in terms of percentage, the chances of a negative test result being correct by looking at the true negatives and false negatives it generates. In the previous example, 12 of 17 negative test results were true negatives (blue), for a 70.6% negative predictive value. The advantage of predictive values is that they change as the prevalence of the diagnosis changes. Loong’s 2003 article, “Understanding sensitivity and specificity with the right side of the brain” is an excellent reference for students and those who are new to these concepts.
It is important to remember that sensitivity, specificity, and predictive values have higher accuracy when test instruments meet the primary instrumentation requisites addressed earlier in this section. If a test measures inconsistently or inappropriately, or if instrumentation equipment or procedural guides are disregarded, the test’s respective sensitivity, specificity, and predictive value percentages are compromised.
Scale and range of test instruments are also important when choosing measurement tools.
Scale refers to the basic measurement unit of an instrument. Scale should be suitably matched to the intended clinical application. For example, if a dynamometer measures in 10-pound increments and a patient’s grip strength increases by a half a pound per week, it would take 20 weeks before the dynamometer would register the patient’s improvement. The dynamometer in this example is an inappropriate measurement tool for this particular clinical circumstance because its scale is too gross to measure the patient’s progress adequately.
Range involves the scope or breadth of a measurement tool, in other words, the distance between the instrument’s beginning and end values. The range of an instrument must be appropriate to the clinical circumstance in which it will be used. For example, the majority of dynamometers used in clinical practice have a range from 0 to 200 pounds and yet, grip scores of many patients in rehabilitation settings routinely measure less than 30 pounds. Furthermore, the accuracy of most test instruments is diminished in the lowest and highest value ranges. This means that clinicians are assessing clients grip strength using the least accurate, lower 15% of available dynamometer range. With 85% of their potential ranges infrequently used, current dynamometers are ill suited to acute rehabilitation clinic needs. However, these same dynamometers are well matched to work-hardening situations where grip strengths more closely approximate the more accurate, midrange values of commercially available dynamometers.
Standardized Tests Versus Observational Tests
Through interpretation, standardized tests provide information that may be used to predict how a patient may perform in normal daily tasks. For example, if a patient achieves x score on a standardized test, he may be predicted to perform at an equivalent of the “75th percentile of normal assembly line workers.” Standardized tests allow deduction of anticipated achievement based on narrower performance parameters as defined by the test.
In contrast, observational tests assess performance through comparison of subsequent test trials and are limited to like-item-to-like-item comparisons. Observational tests are often scored according to how patients perform specific test items; that is, independently, independently with equipment, needs assistance, and so forth. “The patient is able to pick up a full 12-ounce beer can with his injured hand without assistance.” Progress is based on the fact that he could not accomplish this task 3 weeks ago. Observational information, however, cannot be used to predict whether the patient will be able to dress himself or run a given machine at work. Assumptions beyond the test item trial-to-trial performance comparisons are invalid and irrelevant. Observational tests may be included in an upper extremity assessment battery so long as they are used appropriately.
Computerized Assessment Instruments
Computerized assessment tools must meet the same primary measurement requisites as noncomputerized instruments. Unfortunately, both patients and medical personnel tend to assume that computer-based equipment is more trustworthy than noncomputerized counterparts. This naive assumption is erroneous and predisposed to producing misleading information. In hand rehabilitation, some of the most commonly used noncomputerized evaluation tools have been or are being studied for instrument reliability and validity (the two most fundamental instrumentation criteria). However, at the time of this writing, none of the computerized hand evaluation instruments have been statistically proven to have intrainstrument and interinstrument reliability compared with NIST criteria. Some have “human performance” reliability statements, but these are based on the fatally flawed premise that human normative performance is equivalent to gold-standard NIST calibration criteria. Who would accept the accuracy of a weight set that had been “calibrated” by averaging 20 “normal” individuals’ abilities to lift the weights? Human performance is not an acceptable criterion for defining the reliability (calibration) of mechanical devices, including those used in upper extremity rehabilitation clinics.
Furthermore, one cannot assume that a computerized version of an instrument is reliable and valid because its noncomputerized counterpart has established reliability. For example, although some computerized dynamometers have identical external components to those of their manual counterparts, internally they have been “gutted” and no longer function on hydraulic systems. Reliability and validity statements for the manual hydraulic dynamometer are not applicable to the “gutted” computer version. Even if both dynamometers were hydraulic, separate reliability and validity data would be required for the computerized instrument.
The inherent complexity of computerized assessment equipment makes it difficult to determine instrument reliability without the assistance of qualified engineers, computer experts, and statisticians. Compounding the problem, stringent federal regulation often does not apply to “therapy devices.” Without sophisticated technical assistance, medical specialists and their patients have no way of knowing the true accuracy of the data produced by computerized therapy equipment.
Although many measurement instruments are touted as being “standardized,” most lack even the rudimentary elements of statistical reliability and validity, , relying instead on normative statements such as means or averages. These norm-based tests lack even the barest of instrumentation requisites, meaning they cannot substantiate their consistency of measurement nor their ability to measure the entity for which they were designed. Because relatively few evaluation tools fully meet standardization criteria, instrument selection must be predicated on satisfying as many of the previously mentioned requisites as possible. Hand/upper extremity assessment tools vary in their levels of reliability and validity according to how closely their inherent properties match the primary instrumentation requisites.
As consumers, medical specialists must require that all assessment tools have appropriate documentation of reliability and validity at the very least. Furthermore, “data regarding reliability [and validity] ‡ should be available and should not be taken at face value alone; just because a manufacturer states reliability studies have been done, or a paper concludes an instrument is reliable, does not mean the instrument or testing protocol meets the requirements for scientific design.” Purchasing and using assessment tools that do not meet fundamental measurement requisites limits potential at all levels, from individual patients to the scope of the profession.
Functional Assessment Instruments
Handedness is an essential component of upper extremity function. Traditionally, the patient self-report is the most common method of defining hand preference in the upper extremity rehabilitation arena. Although hand preference tests with established reliability and validity have been used by psychologists to delineate cortical dominance for several decades, , knowledge of these tests by surgeons and therapists is relatively limited. Recent studies show that the Waterloo Handedness Questionnaire (WHQ), a 32-item function-based survey with high reliability and validity, more accurately and more extensively defines hand dominance ( Fig. 12-3 ) than does the patient self-report. – The WHQ is inexpensive, simple and fast to administer, easy to score, and patients respond positively to it, welcoming its user-friendly format and explicitness of individualized results. Better definition of handedness is important to clinicians and researchers alike, in that it improves treatment focus and outcomes on a day-to-day basis, and, through more precise research studies, it eventually enhances the professional body of knowledge. (Also see the Grip Assessment section of this chapter.)
Grip strength is most often measured with a commercially available hydraulic Jamar dynamometer ( Fig. 12-4 ), although other dynamometer designs are available. Developed by Bechtol and recommended by professional societies, the Jamar dynamometer has been shown to be a reliable test instrument, provided calibration is maintained and standard positioning of test subjects is followed.
In an ongoing instrument reliability study, of over 200 used Jamar and Jamar design dynamometers evaluated by the author, 51% passed the requisite +0.9994 correlation criterion compared with NIST F-tolerance test weights (government certified high-caliber test weights). Of these (0.9994 and above), 27% needed minor faceplate adjustments to align their read-out means with the mean readings of the standardized test weights. Of 30 brand new dynamometers, 80% met the correlation criterion of +0.9994. Interestingly, two Jamar dynamometers were tested multiple times over more than 12 years with less than a 0.0004 change in correlation, indicating that these instruments do maintain their calibration if carefully used and stored.
Test procedure is important. In 1978 and 1983, the American Society for Surgery of the Hand recommended that the second handle position be used in determining grip strength and that the average of three trials be recorded. , In contrast to the recommended mean of three grip trials, one maximal grip is reported to be equally reliable in a small cohort study. Deviation from recommended protocol should be undertaken with caution in regard to the mean-of-three-trials criterion, which is commonly utilized in research studies. More information is needed before adopting the one-trial method.
Of importance is the concept that grip changes according to size of object grasped. Normal adult grip values for the five consecutive handle positions consistently create a bell-shaped curve, with the first position (smallest) being the least advantageous for strong grip, followed by the fifth and fourth positions; strongest grip values occur at the third and second handle positions. , If inconsistent handle positions are used to assess patient progress, normal alterations in grip scores may be erroneously interpreted as advances or declines in progress. Fatigue is not an issue for the three-trial test procedure, but may become a factor when recording grip strengths using all five handle positions (total of 15 trials with 3 trials at each position). A 4-minute rest period between handle positions helps control potential fatigue effect. Although a 1-minute rest between sets was reported to be sufficient to avoid fatigue, this study was conducted using a dynamometer design instrument whose configuration is different from that of the Jamar dynamometer. Percent of maximal voluntary contraction (MVC) required is also important in understanding normal grip strength and fatigue. For example it is possible to sustain isometric contraction at 10% MVC for 65 minutes without signs of muscle fatigue. Although Young reported no significant difference in grip scores between morning and night, his data collection times were shorter compared to those of other investigators who recommend that time of day should be consistent from trial to trial. ,
Better definition of handedness directly influences grip strength. Using the WHQ, Lui and Fess found a consistent polarization pattern with greater differences between dominant and nondominant grip strengths in normal subjects with WHQ classifications of predominantly left or right preference versus those who were ambidextrous or with slight left or right preferences. This polarization pattern was especially apparent in the second Jamar handle position.
Norms for grip strength are available, but several of these studies involve altered Jamar dynamometers or other types of dynamometers. Independent studies refute the often cited 10% rule for normal subjects, with reports finding that the minor hand has a range of equal to or stronger than the major hand in up to 31% of the normal population. The “10% rule” also is not substantiated when the WHQ is used to define handedness. Grip has been reported to correlate with height, weight, and age, and socioeconomic variables such as participation in specific sports or occupations also influence normal grip. , Grip strength values lower than normal are predictive of deterioration and disability in elderly populations. It is important to note that the Mathiowetz normative data reported for older adults may be “up to 10 pounds lower than they should be.” A 2005 meta-analysis by Bohannon and colleagues that combined normative values from 12 studies that used Jamar dynamometers and followed American Society of Hand Therapists (ASHT) recommended testing protocols may be the most useful reference for normative data to date. In a 2006 study, Bohannon and coworkers also conducted a meta-analysis for normative grip values of adults 20 to 49 years of age.
Although grip strength is often used clinically to determine sincerity of voluntary effort, validity of its use in identifying submaximal effort is controversial, with studies both supporting and refuting its appropriateness. Niebuhr recommends use of surface electromyography in conjunction with grip testing to more accurately determine sincerity of effort. The rapid exchange grip test, , a popular test for insincere effort, has been shown to have problems with procedure and reliability; , and even with a carefully standardized administration protocol, its validity is disputed due to low sensitivity and specificity.
Hand grip strength is often an indicator of poor nutritional status. Unfortunately, the majority of these studies have been conducted using grip instruments other than the Jamar or Jamar-like dynamometers. Furthermore, very few of these studies address calibration methods, rendering their results uncertain.
Bowman and associates reported that the presence or absence of the fifth finger flexor digitorum superficialis (FDS) significantly altered grip strength in normal subjects, with grip strength of the FDS-absent group being nearly 7 pounds less than the FDS-common group, and slightly more than 8 pounds less than the FDS-independent group.
The Jamar’s capacity as an evaluation instrument, the effects of protocol, and the ramification of its use have been analyzed by many investigators over the years with mixed, and sometimes conflicting, results. Confusion is due in large part to the fact that the vast majority of studies reported have relied on nonexistent, incomplete, or inappropriate methods for checking instrument accuracy of the dynamometers used in data collection. A second and more recent development is the ability to better define handedness using the WHQ. Scientific inquiry is both ongoing and progressive as new information is available. Although past studies provide springboards and directions, it is important to understand that all grip strength studies need to be reevaluated using carefully calibrated instruments and in the context of the more accurate definition of handedness provided by the WHQ.
Other grip strength assessment tools need to be ranked according to stringent instrumentation criteria, including longitudinal effects of use and time. Although spring-load instruments or rubber bulb/bladder instruments ( Fig. 12-5 ) may demonstrate good instrument reliability when compared with corresponding NIST criteria, both these categories of instruments exhibit deterioration with time and use, rendering them inaccurate as assessment tools.
Reliability of commercially available pinchometers needs thorough investigation. Generally speaking, hydraulic pinch instruments are more accurate than spring-loaded pinchometers ( Fig. 12-6 ). A frequently used pinchometer in the shape of an elongated C with a plunger dial on top is, mechanically speaking, a single large spring, in that its two ends are compressed toward each other against the counter force of the single center C spring. This design has inherent problems in terms of instrument reliability.
Three types of pinch are usually assessed: (1) prehension of the thumb pulp to the lateral aspect of the index middle phalanx (key, lateral, or pulp to side); (2) pulp of the thumb to pulps of the index and long fingers (three-jaw chuck, three-point chuck); and (3) thumb tip to the tip of the index finger (tip to tip). Lateral is the strongest of the three types of pinch, followed by three-jaw chuck. Tip to tip is a positioning pinch used in activities requiring fine coordination rather than power. As with grip measurements, the mean of three trials is recorded, and comparisons are made with the opposite hand. Better definition of handedness via the WHQ improves understanding of the relative value of dominant and nondominant pinch strength. Cassanova and Grunert describe an excellent method of classifying pinch patterns based on anatomic areas of contact. In an extensive literature review, they found more than 300 distinct terms for prehension. Their method of classification avoids colloquial usage and eliminates confusion when describing pinch function.
Of the five hierarchical levels of sensibility testing ( autonomic/sympathetic response,  detection,  discrimination,  quantification, and  identification), , only the final two levels include functional assessments. (See Chapter 11 for details regarding testing in the initial three levels.) At this time no standardized tests are available for these two categories although their concepts are used frequently in sensory reeducation treatment programs. However, several observation-based assessments are used clinically.
Quantification is the fourth hierarchical level of sensory capacity. This level involves organizing tactile stimuli according to degree. A patient may be asked to rank several object variations according to tactile properties, including, but not limited to, roughness, irregularity, thickness, weight, or temperature. An example of a quantification level functional sensibility assessment, Barber’s series of dowel rods covered with increasingly rougher sandpapers requires patients to rank the dowel rods from smoothest to roughest with vision occluded. For more detail, the reader is referred to Barber’s archived chapter on the companion Web site of this text.
Identification , the final and most complicated sensibility level, involves the ability to recognize objects through touch alone.
The Moberg Picking-Up Test is useful both as an observational test of gross median nerve function and as an identification test. Individuals with median nerve sensory impairment tend to ignore or avoid using their impaired radial digits, switching instead to the ulnar innervated digits with intact or less impaired sensory input, as they pick up and place small objects in a can. This test is frequently adapted to assess sensibility identification capacity by asking patients, without using visual cues, to identify the small objects as they are picked up. A commercially available version of the Moberg Picking-Up Test is available. Currently, the Moberg is an observational test only. It meets none of the primary instrumentation requisites, including proof of reliability and validity; and for most of the noncommercial versions that are put together with a random assortment of commonly found small objects, it lacks even the simplest of equipment standards.
Daily Life Skills
Traditionally, the extent to which daily life skills (DLS) are assessed has depended on the type of clientele treated by various rehabilitation centers. For example, facilities oriented toward treatment of trauma injury patients required less extensive DLS evaluation and training than centers specializing in treatment of arthritis patients. However, with current emphasis on patient satisfaction reporting, it is apparent that more extensive DLS evaluation is needed to identify specific factors that are individual and distinct to each patient and patient population.
The Flinn Performance Screening Tool (FPST) is important because of its excellent test–retest reliability (92% of the items: Kendall’s τ > 0.8 and 97% agreement); and because it continues to be tested and upgraded over time ( Fig. 12-7 ). The FPST allows patients to work independently of the evaluator in deciding what tasks they can and cannot perform. The fact that this test is not influenced by the immediate presence of an administrator is both important and unusual. Busy clinicians may prejudge or assume patient abilities based on previous experience with similar diagnoses; and some patients hesitate to disclose personal issues. With the FPST, the potential for therapist bias is eliminated and issues such as patient lack of disclosure due to social unease is reduced. Patients peruse and rank the cards on their own without extraneous influence. The FPST consists of three volumes of over 300 laminated daily activity photographs that have been tested and retested for specificity and sensitivity of task. Volume 1 assesses self-care tasks; volume 2 evaluates home and outside activities; volume 3 relates to work activities. This test represents a major step toward defining function in a scientific manner.