Balanced Benchmarking of Zero-Shot and RAG Approaches for Biomedical Term Normalization
Abstract
Normalization of medical concepts to an ontology is a key aspect of the natural language processing of biomedical text. It enables the mapping of medical expressions to standardized ontology terms and their identifiers, thereby enhancing the interoperability and computability of medical concepts. Although large language models (LLMs) can identify and standardize medical terms, they may struggle to accurately map ontology terms to their corresponding ontology identifiers. These challenges arise from the stochastic nature of LLMs, their limited exposure to uncommon ontology identifiers during training, and their lack of an integrated lookup mechanism. We generated test sets of synthetic terms to assess normalization performance by both zero-shot prompted and retrieval-augmented generation (RAG) prompted methods across two ontologies (Human Phenotype Ontology and Gene Ontology) and three LLMs (GPT-4o, LLaMA 3.3 70B, and Phi-4). To ensure a calibrated and fair evaluation of normalization, the test set was balanced along two axes: (1) term prevalence in biomedical literature, as estimated by PubMed Central frequency counts, and (2) semantic proximity to ontology terms, as assessed by cosine similarity of BioBERT embeddings. Our results demonstrate that RAG consistently outperforms zero-shot prompting, particularly on low-prevalence terms that are infrequently encountered in the biomedical literature. This highlights the value of RAG in compensating for gaps in model exposure to uncommon medical concepts. We demonstrate that a synthetic test set can be a valuable tool for evaluating biomedical term normalization across LLMs.
Department(s)
Cooperative Engineering Program
Document Type
Conference Proceeding
DOI
10.1109/CIBCB66090.2025.11177118
Keywords
BioBERT, cosine similarity, Gene Ontology, Human Phenotype Ontology, large language models, normalization, ontology identifiers, Ontology mapping
Publication Date
1-1-2025
Recommended Citation
Do, Thanh Son; Obafemi-Ajayi, Tayo; and Hier, Daniel B., "Balanced Benchmarking of Zero-Shot and RAG Approaches for Biomedical Term Normalization" (2025). Faculty Scholarship. 253.
https://bearworks.missouristate.edu/articles00/253
Journal Title
2025 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology Cibcb 2025