3.9 Selecting the Regexp Matching Engine

Release 5.4.0 of gawk introduced a new regular expression matching engine, named MinRX.

MinRX is fully compliant with the POSIX standard for Extended Regular Expressions (EREs), including the additional features needed by awk and gawk. It is also a little stricter that the original matchers are in terms of accepting valid regular expression syntax when specifying a regexp. (These restrictions apply to corner cases that should not come up in day-to-day use.)

Previously, gawk used GNU regex and dfa from GNULIB. These matchers are fast and generally robust, albeit not fully POSIX compliant. MinRX replaces both of them.

Because regular expression matching is such a fundamental part of what awk programs do, introducing a new regular expression engine has some risk associated with it. To alleviate the risk, for the term of one major release, gawk continues to provide access to the original regexp matchers should that be needed.

If the environment variable GAWK_GNU_MATCHERS exists, then gawk switches to using GNU regex and dfa, as previously. Otherwise, the MinRX matcher is the default and that is what it uses.

Should you find a need to switch from MinRX to the original matchers, please submit a bug report describing what did not work (see Reporting Problems and Bugs). Doing so is very important, as it will help the maintainers and the MinRX author fix any issues that are found.

After one major release, the old matchers, and the use of the GAWK_GNU_MATCHERS environment variable, will be removed from gawk.