The availability of large-scale biobanks linking electronic health records (EHRs) to biospecimens has created a powerful opportunity for increasing diversity in psychiatric research.
Defining and assigning case status is an important step in conducting research using EHR. Case status is often defined using phenotypic algorithms which have been developed from majority white individuals.
However, bias in assigning diagnosis codes across groups could lead to poorer performance and exacerbate disparities in research.
We propose to illustrate these issues by examining differences in feature prevalence of psychiatric algorithms across biobanks and race. Less-biased algorithms will improve our ability to conduct cross-population research.