Chapter 2 – SBA Writing Process




Abstract




The National Board of Medical Examiners (NBME) item-writing manual is an excellent starting point for a more detailed analysis of the MCQ writing process.1 It is widely referenced in this chapter, being a mainstay of guidance for question writers aiming to produce high-quality questions. The original ‘red book’ was updated to a 4th edition in 2016,2 continuing to be the gold standard guidance book for improving the quality of multiple choice items.





Chapter 2 SBA Writing Process



Paul Banaszkiewicz



Introduction


The National Board of Medical Examiners (NBME) item-writing manual is an excellent starting point for a more detailed analysis of the MCQ writing process.1 It is widely referenced in this chapter, being a mainstay of guidance for question writers aiming to produce high-quality questions. The original ‘red book’ was updated to a 4th edition in 2016,2 continuing to be the gold standard guidance book for improving the quality of multiple choice items.


Do candidates really need to know the finer details of how to write good-quality SBAs and the processors involved in constructing the section 1 paper? The answer is definitely yes if you experience any major difficulties with this type of summative high-stakes exam. Some candidates do poorly with MCQ type questions, so any guidance is better than nothing.


For most candidates, some general information for the written paper is always useful, especially if it neatly summarises information from a variety of different sources that may be difficult or time-consuming to find otherwise.



Aims


By the end of this chapter, candidates should have a greater appreciation of the complexity of constructing SBAs to ensure a fair, valid and reliable section 1 exam.


Going through the process of how SBAs are constructed will provide general guidance to a candidate in their overall preparation for section 1.3


Investing extra time working through this chapter may score a candidate the extra couple of marks that may pull them over the line as a borderline pass.4


This chapter will make clear why there are so many poor-quality orthopaedic MCQ books out on the market. It is very difficult to construct a good-quality new relevant SBA and much easier to bastardise existing questions already out there or spend an evening producing some poor-quality questions without understanding the sophisticated nuances of SBA construction.


Constructing good-quality SBAs needs considerable examiner training and question writers need to initially attend workshops for training and advice in their construction before being allowed to start contributing to the question bank


Looking ahead, this chapter may prove useful reading if you end up writing MCQ type questions for exams in the future.


For aspiring TPDs or future examiners, it is important to know the intricacies of how to write SBAs and the processors involved in constructing the section 1 paper. This will allow you to give more specific and useful advice to candidates who may be repeated failures on this section of the exam


In any detailed lecture on section 1 of the FRCS (Tr & Orth) exam reliability, content validity and educational theory (Miller’s pyramid, Bloom’s Taxonomy) are all discussed. Therefore, it is worth going over these terms as if unfamiliar these concepts can be difficult to grasp.


Last, those candidates with an educational slant will find the whole process of constructing the section 1 exam fascinating.



Educational Theory


Miller in 1990 introduced an important framework that can be presented as four tiers/levels of a pyramid to categorise the different levels at which trainees needed to be assessed. Although SBAs can be used to test application of knowledge and higher order thinking, their construction is difficult and in general they assess the bottom two levels of ‘knows’ and ‘knows how’ in Miller’s pyramid (Figure 2.1).5




  • Knows – Knowledge or information that the candidate has learned



  • Knows how – Application of knowledge to medically relevant situations



  • Shows how – Simulated demonstration of skills in an examination situation



  • Does – Behaviour in real-life situations


Workplace-based assessments (WBA) were introduced into the postgraduate curriculum because there were concerns that high-stakes examinations that used tests such as single best answers or EMI encouraged rote learning. It is also known that performance in a controlled assessment correlates poorly with actual performance in professional practice.





Figure 2.1 Miller’s pyramid. The different layers represent the different components of clinical competency and how they can be assessed. WBA attempt to assess how an individual performs in the workplace, i.e. what they actually do.


In 1956, Bloom et al6 described six levels in the cognitive domain: (1) knowledge recall; (2) comprehension; (3) application; (4) analysis; (5) evaluation; and (6) synthesis. Over the years Bloom’s Taxonomy has been revised and alternative taxonomies created. A substantial revision occurred in 2001 to a more dynamic classification that uses action verbs to describe the cognitive processes and a rearrangement of the sequence within the taxonomy (Figure 2.2; Table 2.1).





Figure 2.2 Bloom’s Taxonomy




Table 2.1 Bloom’s Taxonomy. Key words to use in questions pitched at each level

























Remember Understand Apply Analyze Evaluate Create
Who

What

When

Define

Identify

Describe

Label

List

Name

State

Match

Recognise

Select

Examine

Locate

Memorise

Quote

Recall

Retrieve

Reproduce

Tabulate

Copy
Demonstrate

Explain

Describe

Interpret

Clarify

Classify

Categorise

Differentiate

Discuss

Distinguish

Infer

Predict

Identify

Report

Select

Outline

Review

Express

Translate
Solve

Illustrate

Calculate

Execute

Carry out

Discover

Show

Examine

Choose

Schedule

Implement

Use

Make use of

Employ

Organise
Differentiate

Distinguish

Analyse

Compare

Classify

Contrast

Separate

Explain

Select

Categorise

Divide

Order

Prioritise

Divide

Inspect

Make assumptions

Draw conclusions
Check

Co-ordinate

Reframe

Defend

Rate

Appraise

Critique

Judge

Support

Decide

Recommend

Summarise

Assess

Choose

Defend

Estimate

Grade

Find errors

Compare

Rate

Measure

Provide opinion
Design

Compose

Create

Plan

Design

Formulate

Produce

Construct

Organise Generate

Hypothesise

Develop

Assemble

Rearrange

Modify

Improve

Adapt

Elaborate

More recently, the shape of Bloom’s Taxonomy has been represented not as a pyramid – where there is a large base composed of facts and a tiny peak of creativity (which someone might interpret to mean that we should spend the majority of our time focusing purely on knowledge) to a broad wedge that better highlights the value of creating, evaluating and analysing (Figure 2.3).



Remembering: the candidate can remember previously learned material from long-term memory by recalling facts, terms, basic concepts and answers, e.g.


List the causes of …


What are the steps in … ?



Understanding: the candidate can explain ideas or concepts by organising, translating, interpreting, giving descriptions and stating main ideas, e.g.


Discuss the causes of …


Explain the pathophysiology



Applying: the candidate can solve problems by applying acquired knowledge, facts, techniques and rules in a different way, e.g.


Provide a differential diagnosis



Analysing: the candidate can distinguish between the different parts, how they relate to each other and to the overall structure and purpose. This involves examining and breaking information into parts by identifying motives or causes, making comparisons and finding evidence to support generalisations, e.g.


How will your differential diagnosis be altered in the light of investigation findings?



Evaluating: the candidate makes judgements and justifies decisions about information, presenting and defining opinions by making judgements about information, validity of ideas or quality of work based on a set of criteria e.g.


Justify your management of this patient.



Creating: the candidate puts elements together to form a functional whole, create a new product or point of view, e.g.


What will be your plan of management?





Figure 2.3 Modification of pyramid shape of Bloom’s Taxonomy into broad wedge to better emphasise the value of creating, evaluating and analysing.


Bloom’s Taxonomy is a hierarchical classification, with the lowest cognitive level being ‘remembering’ and the highest being ‘creating’. The lower three levels can be attained with superficial learning so-called Lower Order Thinking Skills (LOTS) such as memorisation. The upper three levels involve Higher Order Thinking Skills (HOTS) and can only be attained by deep learning.


An ongoing development of the examination is the progressive rewriting of questions in the bank that are currently recorded as level 1 questions (factual knowledge) into higher order questions.


In constructing multiple choice items to test higher order thinking, it is helpful to design problems that require multilogical thinking, along with designing alternatives that require a high level of discrimination.



Higher Order Thinking


This is integration/interpretation (questions which require ‘putting the pieces together’) and problem solving (questions which require ‘clinical judgement’), not simple recall (questions which can be answered with a Google search).



Multilogical Thinking


Multilogical thinking is defined as ‘thinking that requires knowledge of more than one fact to logically and systematically apply concepts to a problem’.7 There has been a conscious move to rewrite the question bank with SBAs that require multilogical thinking to answer.



Highly Discriminating Questions


These are questions that provide viable alternatives so that they require a high degree of discrimination to answer.



SBAs



Advantages of SBAs




  • SBAs can assess a wide sample of curriculum content within a relatively short period of time. This leads to high reliability and improved validity.



  • They are a highly standardised form of assessment where all the trainees are assessed with the same questions. It is a fair assessment in that all the trainees sit the same exam.



  • They are easy to administer and mark.



  • SBA marking is mostly automated and hence examiner subjectivity is removed from the assessment process.



Main Disadvantages of SBAs





  • The trainee’s reasons for selecting a particular option/response cannot be assessed.



  • Although a wide sample of assessment material can be assessed, the assessment does not provide an opportunity for an in-depth assessment of the content.



  • Constructing good SBAs needs considerable examiner training.



Exam boards use a utility model to analyse different assessment tools:



Utility = (R) × (V) × (A) × (E) × (C) × (P)




  • R – Reliability. Can the exam results of a given candidate in a given context be reproduced? To what extent can we trust the results?



  • V – Validity. Does the assessment assess what it purports to assess?



  • A – Acceptability. How comfortable are the different stakeholders (candidates, examiners, examination boards, public, National Health Service) with the examination system?



  • E – Educational impact. Does the exam drive the trainees towards educationally and professionally valuable training?



  • C – Cost effectiveness. Is the expenditure– in terms of money, time and manpower– to develop, run and sustain the examination process worthwhile in relation to what is learned about the candidate?



  • P – Practicability. How ‘doable or workable’ is the assessment instrument, given the circumstances? Are there sufficient resources to mount the exam?


Applying the utility model for SBAs we get




  • Reliability: high


The SBA results are highly reliable, as almost identical scores can be obtained if a similar candidate with similar ability is given the same set of SBAs, regardless of who marks the questions.




  • Validity: high for knowledge recall


An SBA is good at testing factual recall of knowledge. They can also be used to test application of knowledge and higher order thinking, although the construction of such SBAs is difficult and requires training.




  • Acceptability: high


SBAs have been used extensively in medical education. Both trainees and examiners have come to accept them. Constructing good SBAs, however, is difficult.




  • Educational impact: (moderately)


Properly constructed SBAs will drive the learner towards learning important information.


However, SBAs developed to test trivial knowledge will lead to rote learning. Fragmentation of knowledge is another criticism.




  • Cost: moderate


The cost of administering an SBA test is low. In contrast, face-to-face peer review meetings of submitted SBAs are expensive to hold, as they involve substantial travel and accommodation costs. However, the quality of scrutiny that can be brought to bear on the question material justifies this outlay and affords considerable confidence in the quality of the product.




  • Practicability: high


SBAs are easy to administer as a computer-based assessment.



Item Analysis of SBAs


Item analysis output indicates the percentage of candidates in the various subgroups who selected each option of an SBA.


Each SBA is analysed as to the percentage of candidates scoring it correctly from each subgroup. The test group is usually divided into fifths, as this allows more detailed analysis around the pass/fail than if quartiles were used.


The spread should be like a Gaussian curve. The exam board members are not very interested in distinguishing the very best or worst candidates. The curve is concentrated in the centre and the exam board members want to spread this middle area out so that one question cannot decide if a candidate passes or fails the exam.



Easy Questions


With these questions (Figure 2.4), around 90% of candidates get the correct answer. As such, easy questions do not discriminate between the very good or very bad performing candidate. More important, an easy question does not differentiate between candidates around the level of minimal competence required for a pass. When paper 1 analysis flags up these questions, they are either scrapped or have to be extensively reworked.





Figure 2.4 Easy SBA



Difficult Questions


These questions (Figure 2.5) are just as useless as an easy question. Again, they do not differentiate between a good or bad candidate or, more important, make a distinction between borderline candidates – those who can be passed and those who must re-sit. Similar to easy SBAs, difficult SBAs are discarded if they are also of poor quality and require very extensive rewriting.





Figure 2.5 Difficult SBA



Poorly Performing Questions


These questions (Figure 2.6(a) and (b)) may involve the bottom 20% of candidates getting an SBA mainly correct while the top 20% of candidates scoring mainly incorrectly. It’s a poor SBA, as overall it is not following candidate form. Another example is where there has been a random spread of correct answers between groups.








Figure 2.6 (a) and (b) Poorly performing SBAs


Usually the question is poorly written, the wrong answer has been selected by the examiners or there has been an error of typing.


Poorly performing questions are removed. Questions that have more than 90% or 10% failed/pass are also removed.


All questions that score poorly, i.e. where the percentage of correct responses to that alternative is below 30%, are checked. Questions where the top 20% of candidates score significantly lower than average are also reviewed.



Good Performing Questions


There is a graduation in candidates obtaining the correct answer from the top one-fifth mainly scoring the question correctly to the bottom one-fifth with candidates mainly scoring it incorrectly (Figure 2.7).





Figure 2.7 Good SBA


This question discriminates. There is point by point discrimination; if it is >.3, it is a good question



Ideal SBA


These questions (Figure 2.8) discriminate candidates at the pass/fail mark. A good quality question should be answered correctly by 35–85% of just passing candidates (defined as those scoring an overall mark within 10% of the pass mark).





Figure 2.8 Ideal SBA


There should also be an obvious positive correlation between the performance of the cohort on the individual question and in the examination as a whole (i.e. the question should be answered correctly by appreciably more passing candidates than failing candidates). A reasonable proportion of candidates (especially those who did not pass) should also have chosen each incorrect option.


Item analysis determines difficulty index (DIF I) (p-value), discrimination index (DI) and distractor efficiency (DE)

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jan 14, 2021 | Posted by in ORTHOPEDIC | Comments Off on Chapter 2 – SBA Writing Process

Full access? Get Clinical Tree

Get Clinical Tree app for offline access