A pedagogical regular-expression engine

Abstract

Most major programming languages support regular expressions, but the features, performance, and matching behavior depend on the engine. Therefore we think it is pedagogically useful for students to see how a regular expression engine works, and for that purpose, the simpler the engine the better. We present a regular-expression engine written in 70 lines of Python 2.7, of which only 44 relate to matching. The engine is driven by the structure of the regular expression and tracks sets of all possible ending positions of a match. We believe it is much simpler than an implementation using backtracking or based on finite automata. The engine runs in polynomial time, always finds the longest possible match, and can be trivially modified to find the shortest possible match. Its expressive power is exactly that of the regular languages; it does not support nonregular features like backreferences. However, it does support general intersection and complement, powerful operators unavailable in most engines. There are a number of obvious extensions to the engine that would make excellent student projects.

Department(s)

Computer Science

Document Type

Article

Publication Date

2013

Journal Title

Journal of Computing Sciences in Colleges

Citation-only

Share

COinS