Tuesday, January 13, 2009

Improve performance of Regular expressions by using complied regular expressions


In one of our projects, we had convert given output file to a given specific format. Which included large nuumber fo string operations and number of regular expressions to execute over many different strings. Our project was completed functionally then later we faced huge performnace issues. As we were using complex regular expression to find the match and convert to given format and it was consuming huge time.


Then we came across the article which mentioned about the complied regular expressions. Using complied regular expressions the timings came to half of the original timings.
Lets see here what are and how to use complied regular expressions..


If an expression is not compiled, the regular expression engine converts the expression to a series of internal codes that are recognized by the regular expression engine; it is not converted to MSIL. As the expression runs against a string, the engine interprets the series of internal codes. This can be a slow process, especially as the source string becomes very large and the expression becomes much more complex.
Compiling regular expressions allows the expression to run faster.


There are two ways to compile regular expressions. The easiest way is to use the RegexOptions.Compiled enumeration value in the Options parameter of the static Match or Matches methods on the Regex class. And other is to precompile all of these expressions into their own assembly.


Lets go with first option:

We can use the RegexOptions.Compiled enumeration value in the Options parameter of the static Match or Matches methods on the Regex class, shown as below:


Match objMatch = Regex.Match(inputString, pattern, RegexOptions.Compiled);



There is one drwaback of this option that is: an in-memory assembly gets generated to contain the IL, which can never be unloaded. An assembly can never be unloaded from an AppDomain. The garbage collector cannot remove it from memory. If large numbers of expressions are compiled, the amount of heap resources that will be used up and not released will be larger. So use this technique wisely.


The second option is:

Precompiling all of these expressions into their own assembly.Compiling regular expressions into their own assembly immediately gives you two benefits. First, precompiled expressions do not require any extra time to be compiled while your application is running. Second, they are in their own assembly and therefore can be used by other applications.

To compile one or more expressions into an assembly, the static CompileToAssembly method of the Regex class must be used. To use this method, a RegexCompilationInfo array must be created and filled with RegexCompilationInfo objects. The next step is to create the assembly in which the expression will live. An instance of the AssemblyName class is created using the default constructor. Next, this assembly is given a name (do not include the .dll file extension in the name, it is added automatically). Finally, the CompileToAssembly method can be called with the RegexCompilationInfo array and the AssemblyName object supplied as arguments.

No comments:

Post a Comment