RegularExpressions (regex in short) is a powerful and flexible language for text searching.are efficient tools for parsing many kind of data, locating and even replacing data patterns. Regex are now available in most popular language either integrated or as an external library.
Regex are studied here both for their syntax to describe data patterns and their ability to efficiently match them against large amount of data. This document is neither an introduction nor a complete guide to regular expressions. We recommend all interested user to read the excellent O'Reilly's book from Jeffrey E. F. Friedl for really.
Regular Expressions is not a recent subject and were initially used in early 1940's to express the algebra of regular sets, a mathematical theory first describe by. Its theory has conduct to several study in the field of finite automata for data processing. The first regular expressions compiler was explain by Ken Thompson in article published in volume 11, issue 6, June 1968 of the Communication of the ACM.
An implementation of such compiler has been used in early release of ed, a well know *nix editor, and their regex functionality "Global Regular Expression Print" has engendered a still useful and well known tool, grep (which its currently most powerful implementation is).
Regular expressions syntaxes and flavors
Beside the mostly unused POSIX standardization effort, their is a large variety of regular expression flavor in use today. Both theand the are different and could lead to slightly different results for the same regex and input data. The integration of regular expression into the language also vary a lot, from full language integration to procedural and/or object libraries. Some tools and languages like perl and egrep has somewhat greatly enhance regex possibilities and are today's models for most emerging solutions.
All implementations of regular expressions usualy proceed in two steps:
- lexical analysis and compilation of the regular expression to an appropriate byte code
- parsing of the input text using an
are the base of the implementation of regular expression matching algorithm. There are several well known implementations, and the Henry Spencer's NFA engine has been the base for most regular expressions engines in use today. To have a pratical experience of the internal processing done by a Traditional NFA engine, read our .
- Perl also provide a sort of compilation of the input text itself. Its casual use may sometimes improve the matching speed.