Embedding DEI in the Test Development Process
Organizations are increasingly becoming interested in and looking for ways to improve diversity, equity and inclusion (DEI) efforts. For organizations involved in testing and assessment, this can involve both internal and external efforts. For the purposes of this article, the focus will be on the examination development process and how to embed DEI considerations into various stages of examination development.
To understand how credentialing organizations can embed DEI into their examination programs, it is important to understand the key components of DEI first. Diversity can cover a broad range of federal, state and local legally protected classes. These include ethnicity, race, age, creed, gender, gender identity, culture, national origin, religion, genetic information, veteran status and disability. Diversity can also include characteristics such as language in which one is proficient, family structure, life experience and socioeconomic status. Equity promotes fairness to all, ensuring everyone is treated the same way and has access to the same resources. Inclusion means everyone has a sense of belonging. When incorporating all three of these components into exam development, they can strengthen the validity of inferences from examination results.
Standard processes for creating licensure and certification examinations can support and embed DEI efforts through the job task analysis (JTA), the use of optional demographic survey questions, consideration of test design and delivery, subject matter expert (SME) recruitment efforts, item writing principles and training, psychometric analyses and listening to stakeholders. What follows is an overview of how your organization can look to embed DEI efforts into each of these steps in the test development process.
Job Task Analysis
The JTA is the foundation on which the exam blueprint is based. It is important the blueprint results from a systematic process using panels of SMEs who have thoughtfully considered DEI. The demographics of the JTA panel members should be considered in relation to the demographics of the candidate population; the exam blueprint should be the product of a diverse group of contributors representing the profession and stakeholders. Whatever the method used to conduct the JTA, survey or task force validation study, those surveyed should represent the diversity of both the profession and stakeholders as well.
Use of Optional Demographic Questions on Examination Applications
By asking optional demographic questions, such as those related to age, ethnicity and gender, for the candidate pool on an exam application, organizations have the opportunity to assess whether test takers are being unfairly discriminated against by the test or the test administration process. While answering demographic questions cannot be required, it is still important to ask them. As a starting point, demographic categories can be found on the U.S. Equal Employment Opportunity Commission (EEOC), the U.S. Census Bureau and the National Institutes of Health (NIH) websites.
It is critical to consider the way in which the demographic questions are asked; they should be phrased in a sensitive manner. For instance, asking “With what gender do you identify?” or “With what race or ethnicity do you identify?” is more sensitive than questions phrased as “What is your gender?” or “What is your race?”
The options for each question should be inclusive of each group and the demographic questions themselves should not be discriminatory. For example, when asking a gender question, are there only fields for male and female? Your organization can be more inclusive by expanding on gender identities, such as adding options for those identifying as transgender or nonbinary. For ethnicity, consider allowing the option to select multiple ethnicities or add “multicultural” as an option to be more inclusive of those with multiple backgrounds. Care should be taken, however, to provide options that facilitate the collection and analysis of data.
It is recommended that the collected aggregate demographic information be examined on a regular basis. While these are optional questions and are not answered by all candidates, it is still good practice to look at the data to identify any trends or missing subgroups.
Consideration of Test Design and Delivery
Credentialing organizations should consider the design and delivery of the examination in terms of equity and accessibility.
There should be a strong rationale for the number of items and allowed time for an examination. Longer examinations require candidates to take more time off from work. This may disadvantage candidates who can not afford to take a half or full day off from work or who may have personal responsibilities such as family responsibilities. Is the test's length necessary or could it be shorter?
The credentialing organization should also make a conscious decision about the delivery modality of the examination. Is the candidate required to travel far distances in order to sit for the examination in a proctored test center? Is the only option to take the examination using personal equipment at home in a remote proctored environment? Is this disadvantaging candidates who may not have required equipment or a stable internet connection? Equity and accessibility must be considered to have a fair, standardized process for all candidates.
Subject Matter Expert Recruitment Efforts
It is essential to ensure that the SME panels used throughout the test development process, including JTA, item writing, item review, exam review and standard setting, are representative of the candidate pool and the population as a whole.
Organizations should commit to recruiting, training and engaging diverse SMEs. In selecting SMEs, consideration should be given to characteristics such as geographic location, years and type of experience, practice setting, educational background, gender, race and ethnicity.
Item Writing Principles and Training
For the greatest value in test development, staff and SMEs working on the exam program should go through DEI training to be alerted to potential bias and stereotyping in the wording of the test questions.
By avoiding bias in test questions, organizations ensure the questions do not have any characteristics that would result in differential performance for candidates of the same ability but from different groups. Asking questions like the below can help identify potential problems:
- Does the item contain content or language unfamiliar to some candidate populations?
- Will members of one candidate group get an item correct or incorrect for the wrong reason?
- Are there clues included in the item that would facilitate different performances by one group over another?
By being more sensitive in the wording of test questions, organizations can help ensure that exams are free of material that may be interpreted as offensive, demeaning or emotionally charged. Careful review by a diverse group of SMEs can help answer the following questions:
- Does the item reference stereotypical roles (e.g., referring to the doctor as “he” or the nurse as “she”) or stereotypical situations?
- Would an item offend a candidate such that lower performance may result?
Conducting Psychometric Analyses
If the optional demographic questions are collected from examination applicants, it is imperative to regularly produce and monitor overall examination summary reports that include the performance of these groups. This is often done at the overall exam level. However, for maximum understanding and insight, differential item functioning (DIF) analyses help determine whether different subgroups respond differently to particular test questions.
A DIF analysis may be used to assess whether there are differences in response to a particular item as a function of respondent characteristics such as gender, ethnicity, age group, education, or socioeconomic status, given the same ability level. DIF analyses may also be applied to evaluate cross-cultural response differences, e.g. by country or different language version.
Once items are flagged by the DIF analysis, they should be reviewed with SMEs for bias and changed if needed. Items should also be monitored over time for multiple DIF flags or recurring issues. The DIF feedback on items can improve item writing by providing real world examples of problematic items and guide adjustments to decrease unfair gaps. Items that have been revised should be pretested and reviewed again to determine if the changes to the content had any influence on the item performance and DIF analysis. All results must be reviewed with caution. When evaluating group differences on examination performance, ruling out third-variable explanations such as different types of training programs can help strengthen your conclusions about the group effect.
Listening to Stakeholders
Above all else, organizations should listen to their stakeholders. By reading exam comments and social media postings, organizations can uncover complaints about bias or stereotyping in the exam program.
Item comments received from test candidates that raise bias concerns should be reviewed by SMEs with cultural competence and sensitivity and by professionals representative of the diverse population. Changes to the item or scoring may be recommended as a result of the review.
Conclusion
Credentialing organizations that follow a systemic DEI best practice model in their examination development process can be confident they are embedding DEI principles into their examination programs and thereby treating candidates equitably. DEI is not a one-and-done deal. It is continuous and should naturally become ingrained and part of the examination development process. Fully integrating DEI into the process will ensure more equitable — and therefore more valid — examination programs.