Benchmark .NET regular expression source generators
If you worked with regular expression and have upgraded to .NET 7 or newer, you may see a warning SYSLIB1045: Use GeneratedRegexAttribute to generate the regular expression implementation at compile time.
to alert you that you can use the new regular expression source generation.
But is it really better? I have done a benchmark to check and the short answer is not really. I will show you my method and results.
Benchmark
using System.Diagnostics;
using System.Text.RegularExpressions;
var sw = new Stopwatch();
// Source generation
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
RegexGenerated().Match("[email protected]");
}
sw.Stop();
Console.WriteLine("Source generation: {0}ms", sw.ElapsedMilliseconds);
// Compiled
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
RegexCompiled.Match("[email protected]");
}
sw.Stop();
Console.WriteLine("Compiled: {0}ms", sw.ElapsedMilliseconds);
// Non-Compiled
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
RegexNonCompiled.Match("[email protected]");
}
sw.Stop();
Console.WriteLine("Non-compiled: {0}ms", sw.ElapsedMilliseconds);
partial class Program
{
private const string EmailRegex = """^(?(")(".+?(?<!\\)"@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$""";
[GeneratedRegex(EmailRegex)]
private static partial Regex RegexGenerated();
private static readonly Regex RegexCompiled = new Regex(EmailRegex, RegexOptions.Compiled);
private static readonly Regex RegexNonCompiled = new Regex(EmailRegex);
}
The is the Program.cs
of the benchmark written as a .NET 8 console application. It is running on a machine in Windows with 12th gen i7 processor.
I am benchmarking a standard email regex with 3 different Regex
implementation. RegexGenerated()
is the new source generation. RegexCompiled
is Regex
with RegexOptions.Compiled
, which is generally the more optimized way as it compiled the regex tree based on the specific pattern. RegexNonCompiled
is just your standard new Regex()
.
I have make all Regex
static
, meaning only one instance is created, so I am purely benchmarking the performance of the regex Match()
ignoring the initiation time.
I benchmarked 10M Match()
operation against the same string for each Regex
.
Result
Source generation | Compiled | Non-compiled | |
Total time (ms) | 6948 | 2853 | 5882 |
As you can see the new source generated regex is actually the slowest among all implementation. Compiled is the fastest and it is surprising that the standard new Regex()
beats source generation.
Of course, this does not take into account of the Regex
initiation time. Source generation happened in compiled time so there is no run time initiation. I did a quick benchmark. The compiled regex take on average around 0.1ms and non-complied one take 0.01ms to initiate. It is pretty negligible as long as you only initiate it once in your application.
The other notable difference is the memory used. I did a benchmark as well. I basically used the same benchmark code but only keep the regex I am benchmarking and note the peak memory used. Both compiled and non-complied regex used around 25MB while source generated regex used 22MB, which make sense because initiating regex in runtime also mean storing it in memory.
Conclusion
To conclude, if regex is a performance bottleneck is your application, you should considerRegexOptions.Compiled
if you have not already. If you are tempted by the warning to switch to GeneratedRegex
, don’t or at least benchmark your exact situation before switching.
However, I am not saying never use GeneratedRegex
. There are legit reason why you should use source generation, like the ability to debug inside the regex or allowing ahead-of-time compilation or simple looking nicer. For more details, you can read the official Microsoft documentation.