Are LLMs Ready for COLREGs? A Study in Autonomous Maritime Vessel Navigation

Abstract

Autonomous Surface Vehicles (ASVs) operating in shared waterways must comply with the International Regulations for Preventing Collisions at Sea (COLREGs). While traditional rule-based systems often struggle with ambiguity and scalability, Large Language Models (LLMs) offer a promising alternative for flexible decision-making. This paper investigates whether LLMs are ready to support COLREG-compliant navigation by evaluating ChatGPT, DeepSeek, and Claude across five multi-vessel scenarios. To probe their reasoning capabilities, we vary prompt richness at three levels, minimal, moderate, and detailed, designed to simulate real-world variability in available information and communication. Each LLM model is assessed across ten trials per condition using three performance metrics namely collision rate, distance traveled, and compliance of actions as per COLREGs. Experimental results indicate that while all three LLM models exhibit limitations in producing COLREG-consistent decisions, ChatGPT and Claude generally avoid collisions, and DeepSeek is sensitive to prompt clarity. Our findings highlight both the potential and current limitations of LLMs in supporting safe, rule-based autonomous maritime navigation, and underscore the critical role of prompt engineering in maritime AI systems.

Department(s)

Computer Science

Document Type

Conference Proceeding

DOI

10.23919/OCEANS59106.2025.11245090

Keywords

ASV, COLREGs, LLM, prompt engineering

Publication Date

1-1-2025

Journal Title

Oceans Conference Record IEEE

Share

COinS