Building the Model.

TitleBuilding the Model.
Publication TypeJournal Article
Year of Publication2023
AuthorsYang HS, Rhoads DD, Sepulveda J, Zang C, Chadburn A, Wang F
JournalArch Pathol Lab Med
Volume147
Issue7
Pagination826-836
Date Published2023 Jul 01
ISSN1543-2165
KeywordsComputer Simulation, Humans, Machine Learning
Abstract

CONTEXT.—: Machine learning (ML) allows for the analysis of massive quantities of high-dimensional clinical laboratory data, thereby revealing complex patterns and trends. Thus, ML can potentially improve the efficiency of clinical data interpretation and the practice of laboratory medicine. However, the risks of generating biased or unrepresentative models, which can lead to misleading clinical conclusions or overestimation of the model performance, should be recognized.

OBJECTIVES.—: To discuss the major components for creating ML models, including data collection, data preprocessing, model development, and model evaluation. We also highlight many of the challenges and pitfalls in developing ML models, which could result in misleading clinical impressions or inaccurate model performance, and provide suggestions and guidance on how to circumvent these challenges.

DATA SOURCES.—: The references for this review were identified through searches of the PubMed database, US Food and Drug Administration white papers and guidelines, conference abstracts, and online preprints.

CONCLUSIONS.—: With the growing interest in developing and implementing ML models in clinical practice, laboratorians and clinicians need to be educated in order to collect sufficiently large and high-quality data, properly report the data set characteristics, and combine data from multiple institutions with proper normalization. They will also need to assess the reasons for missing values, determine the inclusion or exclusion of outliers, and evaluate the completeness of a data set. In addition, they require the necessary knowledge to select a suitable ML model for a specific clinical question and accurately evaluate the performance of the ML model, based on objective criteria. Domain-specific knowledge is critical in the entire workflow of developing ML models.

DOI10.5858/arpa.2021-0635-RA
Alternate JournalArch Pathol Lab Med
PubMed ID36223208
PubMed Central IDPMC10344421
Grant ListR01 MH124740 / MH / NIMH NIH HHS / United States
RF1 AG072449 / AG / NIA NIH HHS / United States
Related Faculty: 
He Sarina Yang, M.D., Ph.D. Amy Chadburn, M.D.

Pathology & Laboratory Medicine 1300 York Avenue New York, NY 10065 Phone: (212) 746-6464
Surgical Pathology: (212) 746-2700