Using the GeneratedRegexAttribute for your Regular Expressions

September 09, 2024#Software Development
Article
Author image.

Scott DePouw, Senior Consultant

For a long time, I’ve been used to creating regular expressions in C# like this:

public class SomeClass
{
  private static readonly Regex Foo = new("foo|bar|fizz|buzz", RegexOptions.Compiled | RegexOptions.IgnoreCase);
  
  public bool DoesItMatch(string input) => Foo.IsMatch(input);
}

I declare it static readonly because I know it’ll never change or require any dynamic values in the expression. Using RegexOptions.Compiled is ideal as I want it to be as optimal as possible. It works well enough, but it has drawbacks. As of .NET 7 there’s a better way to create your regular expressions. Let’s dig in to the GeneratedRegexAttribute and discover how to use it.

Unlike RegexOptions.Compiled, which dynamically compiles an expression to the Intermediate Language (IL) at runtime, the new GeneratedRegexAttribute compiles at build time. This means that all the cost of compilation occurs while you’re building or publishing the application, instead of while running. Stephen Toub takes a deep dive into the under-the-hood material, but this post will focus on how to use the new attribute.

Declaring Generated Regular Expressions

Let’s take my regular expression and generate it at build time using GeneratedRegexAttribute!

public static partial class SomeClass
{
  [GeneratedRegex("foo|bar|fizz|buzz", RegexOptions.IgnoreCase)]
  private static partial Regex Foo();
  
  public bool DoesItMatch(string input) => Foo().IsMatch(input);
}

This attribute generates a code implementation of our regular expression, and provides a singleton instance that gets returned any time we call SomeClass.Foo(). Nice! Now let’s examine the differences between the old and new ways:

  • We now declare a method instead of a field or property, Foo() instead of Foo
  • The method and the class that contains it must be declared partial, as source code for your expression is generated and is what implements it
    • We drop readonly, as it’s incompatible with partial (and redundant)
  • The pattern and options are moved into the attribute (we dropped RegexOptions.Compiled as it’s no longer needed)

Now, about that partial keyword…

The Perils of Partial

Personally, I find the syntax a little clunky and not terribly intuitive. It’s not as simple as just replacing your expression property with a method and putting the attribute on it. You must know how to make it partial. You also must change it to a method, but the compiler helps there, as GenerateRegexAttribute may only be placed on methods. Once you know it, you’re set. That’s the reason this blog post exists (in addition to cementing my own understanding of it).

To stymie partial bleeding into my code, I try to define my regular expressions outside the class it’s being used in.

public static partial class MyRegex
{
  [GeneratedRegex("foo|bar|fizz|buzz", RegexOptions.IgnoreCase)]
  public static partial Regex Foo();
}

// No longer partial!
public class SomeClass
{
  public bool DoesItMatch(string input) => MyRegex.Foo().IsMatch(input);
}

In addition to compartmentalizing partial, I can reach the expression to unit test it individually. Regular expressions themselves are among the easiest things to unit test, as they are pure functions (you give it input, and can test its output; no side effects).

Injecting Regular Expressions as a Dependency

If you really wanted to, you could make the regular expressions injectable, by making MyRegex non-static, with an interface, so that classes using your expressions can return fakes/mocks during unit testing:

public interface IMyRegex
{
  Regex GetFooRegex();
}

public partial class MyRegex : IMyRegex
{
  [GeneratedRegex("foo|bar|fizz|buzz", RegexOptions.IgnoreCase)]
  private static partial Regex Foo();

  public Regex GetFooRegex() => Foo();
}

public class SomeClass(IMyRegex myRegex)
{
  public bool DoesItMatch(string input) => myRegex.GetFoo().IsMatch(input);
}

I haven’t gone this route before, mostly because the expressions I use tend to be simple, and it ends up making unit testing more complex than if I just used the regular expression itself. I want them to be evaluated when unit testing code that consumes it. But the option is there!

Debugging and Automatic Code Documentation

Because source code is now automatically generated, you can debug and step through your regular expressions! The source generator also provides a very human-readable breakdown of what the regular expression pattern does. My Foo() regular expression generates this documentation (screenshot from JetBrains Rider):

Screenshot: Generated Docs

If we peek behind the curtain, we can see the source code generated, and set breakpoints:

Screenshot: Generated Code

The source code generated is hyper-optimized and, while generally readable, is not going to be the epitome of developer-friendliness. Still, it’s another tool in the toolbelt if you need to debug into a particularly complex expression. (Then after you diagnose the issue, write unit tests to cover it!)

Summary

This new, more efficient method of creating regular expressions does take a moment to become familiar with (syntax-wise), but the benefits outweigh the initial cost and potential code-shuffling. Give it a go!

Resources


Copyright © 2024 NimblePros - All Rights Reserved