A simplified retriever to improve accuracy of phenotype normalizations by large language models

Abstract

Large language models have shown improved accuracy in phenotype term normalization tasks when augmented with retrievers that suggest candidate normalizations based on term definitions. In this work, we introduce a simplified retriever that enhances large language model accuracy by searching the Human Phenotype Ontology (HPO) for candidate matches using contextual word embeddings from BioBERT without the need for explicit term definitions. Testing this method on terms derived from the clinical synopses of Online Mendelian Inheritance in Man (OMIM^®), we demonstrate that the normalization accuracy of GPT-4o increases from a baseline of 62% without augmentation to 85% with retriever augmentation. This approach is potentially generalizable to other biomedical term normalization tasks and offers an efficient alternative to more complex retrieval methods.

Department(s)

Cooperative Engineering Program

Document Type

Article

DOI

10.3389/fdgth.2025.1495040

Keywords

cosine similarity, HPO, large language model, OMIM, phenotype normalization, retrievalaugmented generation, small language model

Publication Date

1-1-2025

Recommended Citation

Do, Thanh Son; Obafemi-Ajayi, Tayo; and Hier, Daniel B., "A simplified retriever to improve accuracy of phenotype normalizations by large language models" (2025). Faculty Scholarship. 240.
https://bearworks.missouristate.edu/articles00/240

Journal Title

Frontiers in Digital Health

Faculty Scholarship

A simplified retriever to improve accuracy of phenotype normalizations by large language models

Abstract

Department(s)

Document Type

DOI

Keywords

Publication Date

Recommended Citation

Journal Title

Browse

Search

Author Corner

Faculty Scholarship

A simplified retriever to improve accuracy of phenotype normalizations by large language models

Authors

Abstract

Department(s)

Document Type

DOI

Keywords

Publication Date

Recommended Citation

Journal Title

Share

Browse

Search

Author Corner