Predicting Failures of LLMs to Link Biomedical Ontology Terms to Identifiers: Evidence Across Models and Ontologies
Abstract
Large language models (LLMs) often perform well on biomedical NLP tasks but may fail to link ontology terms to their correct identifiers (IDs). We investigate why these failures occur by analyzing predictions across two major ontologies - Human Phenotype Ontology (HPO) and Gene Ontology-Cellular Component (GO-CC) - and two high-performing models, GPT-4o and LLaMa 3.1 405B. We evaluate nine candidate features related to term familiarity, identifier usage, morphology, and ontology structure. Univariate and multivariate analyses show that exposure to ontology identifiers is the strongest predictor of linking success. In contrast, features like term length or ontology depth contribute little. Two unexpected findings emerged: (1) large "ontology deserts"of unused terms predict near-certain failure, and (2) the presence of leading zeroes in identifiers strongly predicts success in HPO. These results show that LLM linking errors are systematic and driven by limited exposure rather than random variability. Encouraging consistent reporting of ontology terms paired with their identifiers in biomedical literature would reduce linking errors, improve normalization performance across ontologies such as HPO and GO, enhance annotation quality, and provide more reliable inputs for downstream classification and clinical decision-support systems.
Department(s)
Cooperative Engineering Program
Document Type
Conference Proceeding
DOI
10.1109/BHI67747.2025.11269551
Keywords
biomedical term normalization, Gene Ontology, Human Phenotype Ontology, large language models, logistic regression, normalization, ontology
Publication Date
1-1-2025
Recommended Citation
Obafemi-Ajayi, Tayo; Hier, Daniel B.; and Platt, Steven K., "Predicting Failures of LLMs to Link Biomedical Ontology Terms to Identifiers: Evidence Across Models and Ontologies" (2025). Faculty Scholarship. 260.
https://bearworks.missouristate.edu/articles00/260
Journal Title
Bhi 2025 IEEE EMBS International Conference on Biomedical and Health Informatics Conference Proceedings