Preference-based probabilistic planning with partially-ordered temporal goals

Abstract

Partial order preferences arise naturally in human and AI decision-making when the agents are unable or unwilling to express their preferences. We study automatic synthesis of policies in Markov decision processes (MDP) that maximize satisfaction of preferences represented as a partial order on a set of Linear Temporal Logic on Finite Traces (LTLf) formulas. Partial order preferences on LTLf formulas introduce two key challenges to preference-satisfying policy synthesis: unranked outcomes make traditional total-order-based approaches inapplicable, and identifying a non-dominated strategy requires evaluating all probability distributions over finite paths in the MDP, where each path satisfies a subset of LTLf formulas, for which efficient computational models remain unexplored. We present two key contributions: First, we introduce a new computational model called the preference deterministic finite automaton (PDFA) to represent partial orders on subsets of LTLf formulas. We present a procedure to construct a PDFA for a given partial order on a set of LTLf formulas. Second, we identify three types of stochastic orders studied in order theory that transform a partial order on LTLf formulas to a partial order on policies in MDP. For these stochastic orders, we show that synthesizing preference-satisfying policy is equivalent to finding a Pareto optimal policy in a multi-objective MDP, constructed from the original MDP, the PDFA, and the stochastic order. We prove the correctness of our procedure and show its applicability using two case studies motivated by robot motion planning in stochastic environments.

Department(s)

Computer Science

Document Type

Article

DOI

10.1016/j.automatica.2026.112854

Keywords

Formal methods, Markov Decision Process, Planning, Preferences, Temporal logic

Publication Date

5-1-2026

Journal Title

Automatica

Share

COinS