Decision Making and Flag Systems

Gustavo Zanoli


Back pain has become a standard indication for a broad group of conditions in the biomedical literature, as if it were a diagnosis rather than a simple descriptive symptom. This has many advantages from a general medicine and epidemiological perspective, but sometimes it makes it difficult for the specialist to correctly identify and treat different types of back pain. Moreover, this difficulty is reflected in clinical research, where very often the effects of treatments cannot be clearly established due to a great number of confounding factors in a sometimes very inhomogeneous sample of patients.

For these reasons, it is very important for the clinician to undertake a thorough anamnestic interview and to follow defined clinical pathways from the early steps of the decision-making process. As an aid to “navigate” the complexity of these syndromes, several flag systems have been introduced, which aim to quickly identify risk factors for more serious conditions or prognostic factors that might interfere with the recovery from a back pain episode.

This chapter reviews the literature regarding the use of flag systems and history-taking algorithms, and offers some practical advice for the clinician.

A Brief History of the Use of Flags in the Assessment of Low Back Pain

In the early 17th century, a solid red flag was used by military forces to indicate that they intended to start a battle, as seen in poems and paintings that can be easily found on Wikipedia.1

According to Welch,2 in medical practice, the term red flag was originally associated with back pain. The first catalogue of red flags for back pain appeared in the literature in the early 1980s, and since then numerous lists have been compiled. At the same time, Waddell introduced the study of nonorganic signs3 and the biopsychosocial model4 in low back pain (LBP), and the great research interest in psychosocial risk and prognostic factors started. The term yellow flag was introduced by Kendall et al5 in 1997 to cover psychological, social, and environmental risk factors (in contrast to red flags indicating physical risk factors) for long-term disability and work loss and was then adopted in some guidelines. The original list of yellow flags contained many domains, including attitudes and beliefs about back pain and work. Subsequently, in 2002, the system was refined by Chris Main6 so that workplace factors that were originally considered as yellow flags were reclassified in two separate categories: black flags for workplace organizational (objective) conditions, and blue flags for individual (subjective) perceptions about work issues. In 2005, Main et al7 suggested that orange flags should also be added to the toolbox, to identify signs of a more serious mental disorder that requires referral to a psychiatric treatment center (Table 2.1).

Red Flags

Silvano Boccardi, one of the fathers of Italian rehabilitation medicine, is credited with creating a list of more than 800 different causes of back pain. (I learned this from Francesco Greco, head of the school of orthopedics that I attended, and the list can be found in several Italian Web sites and publications, even though I could not find the original reference; it was probably just a paradoxical speculation, calculated by multiplying the possible anatomic sites of pain in 33 vertebrae with the different underlying pathologies, such as traumatic, neurologic, inflammatory, degenerative, neoplastic, infective, and others.) Of course, the intent of Boccardi’s estimate of 800 causes was to be provocative, to emphasize the difficulties of a merely physiological model of conditions such as LBP. In fact, specific back pain, that is, back pain that can be the presenting symptom of a serious underlying pathology, accounts for a minority of cases (less than 10 to 20% according to different reports in specialized settings,8 and 1 to 5% in primary care9). However, given the possibly severe consequences of missing or delaying the diagnosis of an etiologic condition with unfavorable outcomes, it would be highly advisable to exclude those few patients from the start. Furthermore, the vast majority of studies in the literature focus on aspecific LBP, which can be defined only after ruling out other known relevant pathologies. Finally, it can never be assumed by any health care provider that others have already excluded the presence of these conditions, because even the most skilled practitioner can miss them, and clinical situations might change over time. For all these reasons it would be ideal to have a simple and reproducible tool to rapidly (and repeatedly, if necessary) detect the few cases requiring a separate diagnostic workout. This explains the success of the red flag system and its subsequent impact on clinical reasoning and on other sites of musculoskeletal pain.

The consensus on the importance of assessing red flags is not reflected in its practical implementation. In a retrospective review of clinic charts for 160 patients with LBP seen at six outpatient physical therapy clinics, Leerar et al10 found that only seven of the 11 red flag items investigated were documented over 98% of the time. They also noted that experts had provided varied opinions as to what constitutes a red flag finding for patients with LBP, with differences varying from item formulation (e.g., duration of symptoms, specified as over 1, 1.5 or 3 months in the different sources) to including or not including a specific item such as history of trauma, or even to using a cluster of red flags to identify high-risk patients to refer for medical evaluation. Major conditions investigated by red flags can be summarized in five areas: fracture, tumor, infection, inflammatory disease, and cauda equina syndrome.

In a review performed by Zanoli et al11 for the Italian Orthopedic Society in 2011, all six international LBP guidelines reviewed recommended using red flags in the initial assessment, but this recommendation seemed to be opinion based, and the red flag lists differed in the various guidelines. A previous systematic review with a slightly different methodology analyzed 10 different guidelines and found that the number of red flags specified in each guideline ranged from seven to 17, with a mean of 11.12 Overall, 22 red flags were identified, only three of which were included in nine of 10 guidelines: age > 50 years, history of cancer, and steroid use. Eight red flags were potentially associated with spinal cancer, six with cauda equina syndrome, five with spinal fracture, and five with spinal infection, for a total is 24, but two red flags (age > 50 years and urinary retention) were each associated with both cancer and fracture; thus, 22 red flags were identified overall.

Two high-quality diagnostic reviews from the Cochrane Collaboration systematically addressed the evidence on the usefulness of red flag assessment to specifically screen for vertebral fracture9 and malignancy13 in patients presenting with LBP. In the first review, the available evidence did not support the use of many red flags to specifically screen for vertebral fracture in patients presenting with LBP. When combinations of red flags were used, the screening performance appeared to improve. Authors of the second review concluded that there was insufficient evidence to provide recommendations regarding their diagnostic accuracy or usefulness for detecting spinal malignancy, and that indications should not be based on the results of a single red flag question. Because spinal malignancy is a rather rare finding in LBP patients (less than 1% in primary care studies as compared with 3 to 11% for fractures9,13), it probably reduces the possibility of reaching statistically relevant and homogeneous conclusions until larger amounts of data are collected.

Despite these unsatisfactory evidence reviews, I believe it is still a good idea to assess for red flags when examining a new patient with LBP, and reassess them periodically, especially if the condition is not self-limiting, as occurs in most cases. Most of these items are part of a common medical anamnesis anyway, and may help in getting to know the patient and establishing a good physician-patient alliance. It should be noted, however, that many red flags have high false-positive rates, and if used without any critical clinical judgment, the result could be the referral for many inappropriate examinations with consequences for the cost of management and the outcomes of patients with LBP. The choice of an age to be considered at risk (over 50, 64, 70, 74?) largely influences sensitivity and specificity, and should be considered with caution, as not all 70-year-old people are similar. Probably a combination of a limited number of factors (osteoporosis, history of trauma, corticosteroid use, older age, and female gender) with a threshold of at least two positive findings is the best choice at present when screening for a vertebral fracture, whereas a history of cancer (possibly combined with age) is the most important item to assess when screening for malignancy. A history of infection or immunodeficiency and inflammatory disease can help to diagnose some otherwise unexplained nonmechanical back pain, but this does not necessarily mean a different (and more expensive) pathway in stable patients with an established diagnosis who experience a “normal” mechanical LPB. Clinical features of cauda equina syndrome (saddle anesthesia, sphincteric dysfunctions) should also be investigated when neurologic involvement is suspected.

More generally, positivity of findings should not necessarily prompt a referral for expensive diagnostic testing, but they may indicate the need to ask further questions, even using some of the discarded red flags (e.g., unexplained weight loss, nonmechanical pain, etc.) to confirm or exclude suspects, or to search for “reassurance” in other possible sources of a mechanical back pain (workload changes, new mattresses, recent increase in household duties, etc.). Finally, clinicians should remember that the clinical examination is a powerful and rather inexpensive way to confirm some of the red flag suspicions, and all elements should be taken into account before proceeding to the prescription of expensive diagnostic imaging and laboratory tests.

Yellow Flags and Orange Flags

Having excluded specific causes of LBP with red flags, the vast majority of patients with aspecific LBP still remain to be assessed, and in some cases treated or helped to find a way back to their usual activities without pain or disability. This occurs in most cases, even without any treatment, but in some people the pain lasts longer, and may become a frequent or constant presence, affecting quality of life and return to work. The introduction of the biopsychosocial model in LBP opened new perspectives in recognizing patients at risk of heading toward less favorable outcomes. In their appendix to the New Zealand Acute Low Back Pain Guide, Kendall et al5 started from the assumption that most of the known risk factors for long-term disability, inactivity, and work loss are psychosocial, and they proposed a long list of factors (yellow flags) that increase the risk of these problems developing, so that health professionals could subsequently act to prevent these problems, starting in the early stages of management. After this first formulation, other guidelines have proposed their own lists, and other theoretical models described the development and prolongation of LBP with different mechanisms. This led to a great number of psychosocial yellow flags to be assessed. Even after the distinction of work-related blue and black flags (see below), the list of potential influencing factor became long, but the evidence is sparse.

Can yellow flags influence outcomes in people with acute or subacute LBP? Can yellow flags be targeted in interventions to produce better outcomes? Systematic reviews yielded conflicting results, and have even been criticized from a methodological standpoint.14 There might also be a geographic or culturally determined variation in the influence of some factors. For example, fear avoidance beliefs have been shown not to influence the disability and quality of life of Spanish LBP patients.15 A randomized controlled trial comparing the assessment and modification of psychosocial prognostic factors with standard care in the treatment of (sub)acute LBP in general practice found no evidence that general practitioners should adopt a treatment strategy aimed at psychosocial prognostic factors in these patients.16 In an interesting systematic review of risk and prognostic factors for nonspecific musculoskeletal pain classified into International Classification of Functioning (ICF) dimensions, the authors found that for LBP, there is high evidence that fear avoidance and poor social support at work are not prognostic factors for LBP, and that poor social support at work and poor job content are not risk factors for LBP.17 The authors suggested that fear-avoidance beliefs and poor social support at work perhaps should be removed as yellow flags.

More recently, another color has been added to the flag system: orange.7 Orange flags indicate the presence of psychopathology, that is, more severe mental health and psychological problems than those indicated by yellow flags, alerting the clinician to serious psychiatric problems that could require referral to a mental health specialist or psychiatrist, rather than following the normal course of management for mild mental health conditions such as anxiety. Orange flags can indicate excessively high levels of distress, major personality disorders, posttraumatic stress disorders, drug and alcohol abuse/addictions, or clinical depression. It has been recommended that all practitioners should screen for the presence of orange flags, particularly in patients who have been off from work due to illness for more than 4 weeks. At this stage, there is only experts’ opinion on this subject, and orange flags have been defined only insofar as they may be mistaken for yellow flags, and they have not been well studied. However, it does seem to be common sense that clinicians suspecting a previously undiagnosed psychiatric problem should triage patients to a mental health specialist. This does not imply that these patients’ back problem should be left untreated in the end, and reassessment after specialist treatment should be recommended, at least.

Thus, similarly to what has been said about red flags, taking into account yellow flags, particularly when they are at high levels, does seem a better idea than either ignoring them or providing interventions to people regardless of psychological risk factors. This does not necessarily imply a recommendation to use a formal set of items, which would be done in a comprehensive clinical evaluation. In general, practitioners should not expect that a strict protocol and a “cookbook” solution will simply help them avoiding patients with a bad prognosis, but they could use some of these factors to try to assemble a more complete picture and to help establish a good physician-patient relationship. In more specialized settings, with more severe patients, assessment of yellow flags is probably part of the treatment itself anyway, as some of the multidisciplinary interventions specifically address psychological issues.

Blue Flags and Black Flags

It is well known that LBP continues to have a great impact on sick leave and work disability in the industrialized world. Although most working-age adults are able to recover from acute back pain or manage to cope with a recurrent or chronic condition with few work absences, others experience significant periods of work disability or even need to change their job or professional path.

Because there seems to be no anatomic or physiopathological feature that can help identify those at risk of having a poor work-related outcome, a considerable amount of research has been devoted to study nonmedical prognostic factors that could predict such an unfavorable evolution. We have already discussed the importance of psychosocial factors in the discussion about yellow flags. Blue flags have been subsequently defined to help clinician address specific workplace factors that might influence outcomes in LBP patients.6 Originally many of these factors were considered as yellow flags, because they implied a subjective perception about work issues, such as negative expectations of return to work, job dissatisfaction, stress at work, work-related fear-avoidance beliefs (i.e., belief that work is harmful or fear of reinjury), perceptions of physical job demands, and poor colleague or supervisor relationships. On the other side, to balance these subjective factors, black flags indicate actual objective workplace conditions that can affect disability, including on one side employer and insurance system organizational characteristics (category I) and on the other side measures of physical workload and job features (category II).18

Systematic reviews have addressed several blue or black flags as prognostic factors, with negative (job stress), conflicting (job dissatisfaction, work fear avoidance), or at least wavering and inconsistent (recovery expectations, physical job demands, low social support at work) findings. Several review authors recommend that standardized, psychometrically robust instruments should be used in future studies to enable deriving reproducible measurements; however, a systematic review of these instruments for the assessment of blue flags in individuals with nonspecific LBP found that none of the identified instruments, in their current stage of development, could be recommended as blue flag assessment tools.19

Consequently, we have even less robust data on the usefulness of blue flags and black flags in clinical decision making, and these measures have seen limited dissemination in clinical practice, not only because the predictive performance of some tools has not been sufficiently demonstrated, but more importantly, because screening results have rarely been linked to appropriate early intervention strategies.

As stated by the authors of the “Decade of the Flags” Working Group,18 other problems include errors in classifying patients, the time and effort required to administer and score assessment measures and discuss results with patients, and limited treatment options (or effective power) for addressing workplace and psychosocial concerns. Some providers may feel reluctant or unprepared to explore these nonmedical domains, despite their prominence in published medical guidelines for the treatment of LBP.

Despite these limitations, many guidelines still recommend assessing all flags in aspecific LBP patients. Although this probably reflects the attitude of guideline working groups to rely more on anecdotal evidence or expert opinion when making recommendations on issues different from therapeutic interventions, some degree of clinical common sense still advises us to at least consider the possibility of investigating blue flags and black flags in clinical practice.

Questions on physical activities at work or at leisure should be part of the initial assessment, and patients very often introduce the issues addressed by blue flags and black flags themselves, when asked about their job in the context of a medical examination. Clinicians should not ignore the explicit or implicit requests in their patients’ answers; trying to improve pain or disability outcomes in a person who is only (or mainly) there to obtain a certification of temporary or permanent inability to work might prove to be a daunting task. At the same time, despite all the laws on workplace preventive interventions, many workers and employers, especially in small businesses, still ignore some basic steps that can produce good results without requiring expensive structural changes. For those interested in a more formal assessment, while we wait for a valid international standardized recommendation, the 55 questions of the Obstacles to Return To Work Questionnaire (ORTWQ), which was the only instrument that showed adequate psychometric properties even though it is not considered clinically feasible in its present format, according to the previously mentioned systematic review,19 could be a good starting point to identify items that might be relevant for clinical practice and future research.

Decision Algorithms

Having critically reviewed the literature on the flag system, and having noted the lack of evidence to inform clinical practice, other than a generic recommendation to consider the respective issues as suggested by the different colors, readers will understand our skepticism toward the extensive use, especially in practice guidelines, of decision algorithms (and the underlying classification systems), which reflect in most cases the expert panel’s personal views rather than an extensive review of the evidence. Although promising to be an easy-to-use guide throughout the decision-making process of complex clinical conditions, these tools instead promote the idea of cookbook medicine, which is the opposite of what the founders of evidence-based medicine had in mind. A systematic review of articles that specifically described a clinical classification system for chronic LBP reported on the reliability of a classification system, or evaluated the effectiveness of classification-specific interventions,20 and identified 28 classification systems that met inclusion criteria: 16 diagnostic systems, seven prognostic systems, and five treatment-based systems. All the systems were directed at nonoperative management. The authors recommend that none of these classification systems should be adopted for all purposes, and that future efforts in developing a classification system should take into account multidisciplinary (invasive and noninvasive) treatments. A brief screening tool (Keele STarT Back Screening Tool—SBST) designed to identify subgroups of patients to guide the provision of early secondary prevention in primary care, including items that built a psychosocial subscale, has been proposed and subsequently tested in a RCT. The results showed that, at 12 months, stratified care was associated with a mean increase in generic health benefit and cost savings. The SBST (also available online and as an App) stratifies patients into low, medium or high risk categories and for each category there is a matched treatment package.21 More recently, the Nijmegen Decision Tool for Chronic Low Back Pain was presented as the first clinical decision tool based on current scientific evidence and formal multidisciplinary consensus that helps in referring the patient for consultation to a spine surgeon or a nonsurgical spine care specialist.22 Despite the extensive preparatory work based on a systematic review of the available evidence and the appropriate methodology applied to reach consensus within a multidisciplinary panel, only a first version of the decision tool was developed, consisting of a Web-based screening questionnaire and a provisional decision algorithm. This decision tool will require further development and testing before it can be widely recommended.

Chapter Summary

The first catalogue of red flags for back pain appeared in the literature in the early 1980s, and since then numerous lists have been compiled. The term yellow flag was introduced in 1997 to cover psychological, social, and environmental risk factors (in contrast to red flags for physical risk factors). Subsequently, in 2002, workplace factors that were originally considered as yellow flags were classified in two separate categories: black flags for workplace objective conditions, and blue flags for subjective perceptions about work issues. In 2005, orange flags that identify signs of a serious mental disorder were added. Two Cochrane reviews evaluated the usefulness of red flag assessment to specifically screen for vertebral fracture and malignancy, respectively, and did not support the use of many red flags to specifically screen for vertebral fracture in patients presenting for LBP, and concluded that there was insufficient evidence to recommend their use when searching for spinal malignancy, whereas some combination of these might prove to have some value. The evidence about other flags is even more inconclusive. Despite these limitations, many guidelines still recommend in favor of assessing all flags in aspecific LBP patients, but the risk is to overemphasize the use of subsequent diagnostic testing. Decision algorithms have even less data to support their use, and very often reflect simply the panel members’ opinion. It is probably a good idea to assess a limited set of flags in the examination of a new LBP patient, and reassess them periodically especially if the condition is not self-limiting. When used critically and with clinical good sense, most of the flag items are part of a common medical anamnesis anyway, and may help getting to know the patient and establishing a good physician-patient alliance.


Red flags assess physical risk factors for a serious underlying pathology.

Yellow flags assess psychological, social, and environmental risk factors.

Black flags assess workplace objective conditions.

Blue flags assess subjective perceptions about work issues.

Orange flags identify signs of a serious mental disorder.

Evidence from many systematic reviews including two Cochrane reviews is insufficient to recommend the widespread uncritical use of flags.

Decision algorithms have even less data to support their use, and very often reflect simply the panel members’ opinion.

Clinical good sense advises in favor of an anamnestic assessment of some of the flag items as a way to get a general picture, not as a simple checklist to guide the prescription diagnostic test.


Use of red flags without any critical clinical judgment could result in referring the patient for many inappropriate examinations, with consequences for the cost of management and the outcomes of patients with LBP.

Use of flags should not be intended as a way to avoid a problematic patient, but as an aid in identifying sources of problems and maximizing treatment outcomes.

Clinicians should not assume that others have already excluded the presence of relevant risk factors, because even the most skilled practitioner can miss them, and clinical situations might change over time.

