Since the matching of regex is a dynamic process, we need to provide slideshow mode examples. These are best viewed with a recent browser. You should use navigation buttons for exploring each examples.
In these example we don't use any particular language syntax for regex or input text. We will use \n and the like for special character in input text but we will not escape in quotes or even slashes.
Special symbols are used to mark position:
- Saved options for backtracking will be shown over the text and under the regex using symbols.
- Match tries will be shown under the text using ^ symbols.
- Successful match will be shown under the text using ¯ symbols.
- Part of the regex currently driving the process will be shown over the regex using _ symbols.
Introductory example
|
Text
Regex |
usr/local/bin
(/?.*)/(.*)
|
|
Comments
- In this example, we will separate absolute or relative path and a filename.
- No explicit validation is made on names for the purpose of this example.
Optional quantifier on slash |
|
Text
Regex |
~
usr/local/bin
^
__
(/?.*)/(.*)
~~
|
|
Comments
- Since the optional quantifier is greedy, the choice of skipping the slash is saved
- A match of slash is tried without success.
Backtracking to the "skip it" option |
|
Text
Regex |
usr/local/bin
^
__
(/?.*)/(.*)
|
|
Comments
- Since the previous match has failed, backtracking occurs.
- The option to skip the slash is tried and obviously succeed.
|
Text
Regex |
~~~~~~~~~~~~~
usr/local/bin
¯¯¯¯¯¯¯¯¯¯¯¯¯
__
(/?.*)/(.*)
~~
|
|
Comments
- The match any character is followed by a greedy quantifier, the
choice of skipping any character is saved and a match of any character
is tried and succeed.
- The previous step obviously repeat until the end of the input text.
Fail to match the second slash |
|
Text
Regex |
~~~~~~~~~~~~~
usr/local/bin
^
_
(/?.*)/(.*)
~~
|
|
Comments
- Since there is no more character, a try to match the second slash fails
.* release character n but the match fail again |
|
Text
Regex |
~~~~~~~~~~~~
usr/local/bin
^
_
(/?.*)/(.*)
~~
|
|
Comments
- The previous match has failed, so backtracking occurs
- The .* release character
n
- A retry to match the second slash fails
Repeated try to match second slash |
|
Text
Regex |
~~~~~~~~~~
usr/local/bin
^^
_
(/?.*)/.*
~~
|
|
Comments
- The previous steps is repeated 2 more times with the same results
Finally match second slash |
|
Text
Regex |
~~~~~~~~~
usr/local/bin
¯
_
(/?.*)/(.*)
~~
|
|
Comments
- A backtrack finally release a slash and the retry to match the second slash succeed
The final .* match remaining characters and the whole match succeed |
|
Text
Regex |
~~~~~~~~~ ~~~
usr/local/bin
¯¯¯
__
(/?.*)/(.*)
~~ ~~
|
|
Comments
- Repeated tries to match any character match the remaining character in the text and produce several more saved choice.
- Since the end of the regex has been reached, a traditional NFA engine conclude to a successful match and drop all remaining saved choice.
- A POSIX NFA will continue to tries all the saved options by
continuing backtracking, but will drop all subsequent successful match
which will be obviously not longer than this first successful match.
- The final value of
$1
and $2
are "usr/local"
and "bin"
respectively.
This first example is really simple, but it already shows how complex and inefficient could be backtracking in regular expressions. More illustrations of an NFA engines is available:
- Backtracking in regards to correctness and efficiency
- <.+>: an easy crafted but imprecise regex
- <.+?>: Still imprecise but not as lazy as we could thought
- <(?:/[^<>]|[^/<>])+/>: More precise but still inefficient
- <(?:[^/<>]+|/[^<>])+/>: Still more precise and nearly efficient
- <[^/<>]+(?:/+[^/<>]+)*/>: The unrolling the loop technique