Preprocessing of Physician Notes by LLMs Improves Clinical Concept Extraction Without Information Loss

Abstract

Clinician notes are a rich source of patient information, but often contain inconsistencies due to varied writing styles, abbreviations, medical jargon, grammatical errors, and non-standard formatting. These inconsistencies hinder their direct use in patient care and degrade the performance of downstream computational applications that rely on these notes as input, such as quality improvement, population health analytics, precision medicine, clinical decision support, and research. We present a large-language-model (LLM) approach to the preprocessing of 1618 neurology notes. The LLM corrected spelling and grammatical errors, expanded acronyms, and standardized terminology and formatting, without altering clinical content. Expert review of randomly sampled notes confirmed that no significant information was lost. To evaluate downstream impact, we applied an ontology-based NLP pipeline (Doc2Hpo) to extract biomedical concepts from the notes before and after editing. F1 scores for Human Phenotype Ontology extraction improved from 0.40 to 0.61, confirming our hypothesis that better inputs yielded better outputs. We conclude that LLM-based preprocessing is an effective error correction strategy that improves data quality at the level of free text in clinical notes. This approach may enhance the performance of a broad class of downstream applications that derive their input from unstructured clinical documentation.

Department(s)

Cooperative Engineering Program

Document Type

Article

DOI

10.3390/info16060446

Keywords

concept extraction, data interoperability, Doc2Hpo, electronic health records, human phenotype ontology, large language models, physician notes

Publication Date

6-1-2025

Recommended Citation

Obafemi-Ajayi, Tayo; Hier, Daniel B.; Carrithers, Michael A.; Platt, Steven K.; Nguyen, Anh; and Giannopoulos, Ioannis, "Preprocessing of Physician Notes by LLMs Improves Clinical Concept Extraction Without Information Loss" (2025). Faculty Scholarship. 142.
https://bearworks.missouristate.edu/articles00/142

Journal Title

Information Switzerland

Faculty Scholarship

Preprocessing of Physician Notes by LLMs Improves Clinical Concept Extraction Without Information Loss

Abstract

Department(s)

Document Type

DOI

Keywords

Publication Date

Recommended Citation

Journal Title

Browse

Search

Author Corner

Faculty Scholarship

Preprocessing of Physician Notes by LLMs Improves Clinical Concept Extraction Without Information Loss

Authors

Abstract

Department(s)

Document Type

DOI

Keywords

Publication Date

Recommended Citation

Journal Title

Share

Browse

Search

Author Corner