C#
Sam Lau  

Benchmark .NET regular expression source generators

If you worked with regular expression and have upgraded to .NET 7 or newer, you may see a warning SYSLIB1045: Use GeneratedRegexAttribute to generate the regular expression implementation at compile time. to alert you that you can use the new regular expression source generation.

But is it really better? I have done a benchmark to check and the short answer is not really. I will show you my method and results.

Benchmark

using System.Diagnostics;  
using System.Text.RegularExpressions;  

var sw = new Stopwatch();  

// Source generation  
sw.Restart();  
for (int i = 0; i < 10000000; i++)  
{  
    RegexGenerated().Match("[email protected]");  
}  
sw.Stop();  
Console.WriteLine("Source generation: {0}ms", sw.ElapsedMilliseconds);  

// Compiled  
sw.Restart();  
for (int i = 0; i < 10000000; i++)  
{  
    RegexCompiled.Match("[email protected]");  
}  
sw.Stop();  
Console.WriteLine("Compiled: {0}ms", sw.ElapsedMilliseconds);  

// Non-Compiled  
sw.Restart();  
for (int i = 0; i < 10000000; i++)  
{  
    RegexNonCompiled.Match("[email protected]");  
}  
sw.Stop();  
Console.WriteLine("Non-compiled: {0}ms", sw.ElapsedMilliseconds);  

partial class Program  
{  
    private const string EmailRegex = """^(?(")(".+?(?<!\\)"@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$""";  

    [GeneratedRegex(EmailRegex)]  
    private static partial Regex RegexGenerated();  
    private static readonly Regex RegexCompiled = new Regex(EmailRegex, RegexOptions.Compiled);  
    private static readonly Regex RegexNonCompiled = new Regex(EmailRegex);  
}

The is the Program.cs of the benchmark written as a .NET 8 console application. It is running on a machine in Windows with 12th gen i7 processor.

I am benchmarking a standard email regex with 3 different Regex implementation. RegexGenerated() is the new source generation. RegexCompiled is Regex with RegexOptions.Compiled, which is generally the more optimized way as it compiled the regex tree based on the specific pattern. RegexNonCompiled is just your standard new Regex().

I have make all Regex static, meaning only one instance is created, so I am purely benchmarking the performance of the regex Match() ignoring the initiation time.

I benchmarked 10M Match() operation against the same string for each Regex.

Result

Source generationCompiledNon-compiled
Total time (ms)694828535882

As you can see the new source generated regex is actually the slowest among all implementation. Compiled is the fastest and it is surprising that the standard new Regex() beats source generation.

Of course, this does not take into account of the Regex initiation time. Source generation happened in compiled time so there is no run time initiation. I did a quick benchmark. The compiled regex take on average around 0.1ms and non-complied one take 0.01ms to initiate. It is pretty negligible as long as you only initiate it once in your application.

The other notable difference is the memory used. I did a benchmark as well. I basically used the same benchmark code but only keep the regex I am benchmarking and note the peak memory used. Both compiled and non-complied regex used around 25MB while source generated regex used 22MB, which make sense because initiating regex in runtime also mean storing it in memory.

Conclusion

To conclude, if regex is a performance bottleneck is your application, you should consider
RegexOptions.Compiled if you have not already. If you are tempted by the warning to switch to GeneratedRegex, don’t or at least benchmark your exact situation before switching.

However, I am not saying never use GeneratedRegex. There are legit reason why you should use source generation, like the ability to debug inside the regex or allowing ahead-of-time compilation or simple looking nicer. For more details, you can read the official Microsoft documentation.