Regular Expressions, or just RegEx, is a sequence of characters that defines a search pattern. As a UiPath RPA Developer, we use RegEx to extract data out of, e.g., emails and PDFs.
As RPA is about extracting data from one system and placing them in another, learning is necessary. But don’t worry, it’s straightforward.
What is RegEx
RegEx (short for Regular Expressions) is a short sequence of code (characters) that defines a search pattern. That search pattern is used to search for specific data in, e.g., an email.
Fig. 1.1 shows an online RegEx tester. I’ve pasted the text (string) from an invoice. Just above, you’ll find the expression editor. The on-screen help is to the right and is very valuable, when you’re learning RegEx
Table of Contents
My RegEx pattern for this task is:
(?<=Invoice Number ).*
This pattern extracts everything after “Invoice Number ” and gives me the invoice number: “2021-001” (marked with blue).
RegEx in UiPath
Regex Data Extraction in UiPath is effortless. The extraction can be done either in the built-in activity (Matches) or directly with VB.NET code.
Using the Matches activity
Read your text into a variable of the type String. For example, here (Fig. 1.2), my data comes from the PDF called “Invoice”.
After you’ve dragged the “Matches” in from Activities, make sure it’s light blue (if not, click it).
In Properties fill in the following:
Use your text String. Here I use the string coming out of the ‘Read PDF Text’ activity.
Use a RegEx pattern. To get the invoice number we can use:
(?<=Invoice Number ).*
Result: With your mouse click in Result (under Misc), and then press Ctrl+k, this will create a new variable of the type IEnumerable of Matches. Before you do anything else, give it a name. I’ll call mine ienMatches (but feel free to call it whatever you want). You can see a IEnumerable of something as a list with 0 or something in. Here it holds our Matches if there are any.
Drag in a Write Line and write your output variable from the Matches activity. Because it’s an IEnumerable, which is a list/collection we specify 0 in parentheses afterward to get the first match and then .Value. In a “real” robot we’ll write our result to an Excel sheet, a database, or a system, but for testing and learning purposes a Write Line is great.
Try to Run your workflow and in the Output, you’ll have your invoice number (see Fig. 1.3).
Using an Assign
Using the build-in activities ‘Matches’ and ‘Is Match’ is often helpful. However, there is a much more straightforward solution (at least for many cases) to extract data out of text with RegEx in UiPath.
Consider the following text: “Amount due $2,000.00, please pay before 04/24/2020”, where the format always will be the same, but the amount will change. This is because we want to extract the amount and store it in a string, i.e., “2,000.00”.
First, assign the two variables in the variables manager:
- ‘strInput’, a string where we store our input
- ‘strOutut, a string to hold the value of the extracted amount
Use RegEx to assign a value to our output string
Drag in an ‘Assign’ activity (Fig. 2.2) and set the left side to ‘strInput’ and the right side to:
Lets dig into the expression on the right side:
- System.Text.RegularExpressions is the Microsoft .NET namespace
- Regex.Match is the method (covered here https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.match?view=netframework-4.8)
- strInput is the string to search in
- “(?<=\$)(\d+,?)+(.\d+)” is the RegEx pattern, we use to extract the amount
- (?<=\$) is a positive lookbehind, where we specify, we want to look behind a ‘$’
- (\d+,?) then we look for one or more digits (\d+) and one or none thousand delimiter
- + one or more of the preceding patterns (otherwise it would stop after ‘2,’
- (\.\d+) a decimal delimiter (\.) and one or more digits (\d+)
Write the output out
Now just drag in a ‘Write Line’ and set the value to our Output Variable, ‘strOutput’ (Fig. 2.3).
And then we’re done (Fig. 2.4). The workflow will also work with all other values.