We aim to use clinical characteristics of individuals with a bipolar disorder (BD) diagnosis to identify undiagnosed cases and individuals at high predicted risk for being a case. At three participating sites (VUMC, MGB, GHS), we will derive risk prediction models for BD using several machine-learning approaches (Naïve Bayes, Random Forest, XGBoost). The performances of these distinct models will be systematically compared and ensembled at each of the participating sites. Three machine-learning models (Naive Bayesian Classifier, Random Forest, eXtreme Gradient Boosting) are being developed for EHR-based risk prediction of BD. Models are being tested for timeliness, sensitivity, specificity, and calibration at the model development sites and at partner sites.