How to Extract Data With RegEx in UiPath

Regular Expressions, or just RegEx, is a sequence of characters that defines a search pattern. As a UiPath RPA Developer, we use RegEx to extract data out of, e.g., emails and PDFs.

As RPA is about extracting data from one system and placing them in another, learning is necessary. But don’t worry, it’s straightforward.

What is RegEx

RegEx (short for Regular Expressions) is a short sequence of code (characters) that defines a search pattern. That search pattern is used to search for specific data in, e.g., an email.

Fig. 1.1 shows an online RegEx tester. I’ve pasted the text (string) from an invoice. Just above, you’ll find the expression editor. The on-screen help is to the right and is very valuable, when you’re learning RegEx

Table of Contents

Fig. 1.1: RegEx on a invoice

My RegEx pattern for this task is:

				
					(?<=Invoice Number ).*
				
			

This pattern extracts everything after “Invoice Number ” and gives me the invoice number: “2021-001” (marked with blue).

RegEx in UiPath

Regex Data Extraction in UiPath is effortless. The extraction can be done either in the built-in activity (Matches) or directly with VB.NET code.

Using the Matches activity

Read your text into a variable of the type String. For example, here (Fig. 1.2), my data comes from the PDF called “Invoice”.

After you’ve dragged the “Matches” in from Activities, make sure it’s light blue (if not, click it).

Fig. 1.2: Using the Matches activity in UiPath

In Properties fill in the following:

Input:

Use your text String. Here I use the string coming out of the ‘Read PDF Text’ activity.

Pattern: 

Use a RegEx pattern. To get the invoice number we can use:

 

				
					(?<=Invoice Number ).*
				
			

Result: With your mouse click in Result (under Misc), and then press Ctrl+k, this will create a new variable of the type IEnumerable of Matches. Before you do anything else, give it a name. I’ll call mine ienMatches (but feel free to call it whatever you want). You can see a IEnumerable of something as a list with 0 or something in. Here it holds our Matches if there are any.

Drag in a Write Line and write your output variable from the Matches activity. Because it’s an IEnumerable, which is a list/collection we specify 0 in parentheses afterward to get the first match and then .Value. In a “real” robot we’ll write our result to an Excel sheet, a database, or a system, but for testing and learning purposes a Write Line is great.

Try to Run your workflow and in the Output, you’ll have your invoice number (see Fig. 1.3).

Fig. 1.3: Output of our RegEx/Matches workflow

Using an Assign

Using the build-in activities ‘Matches’ and ‘Is Match’ is often helpful. However, there is a much more straightforward solution (at least for many cases) to extract data out of text with RegEx in UiPath.

Consider the following text: “Amount due $2,000.00, please pay before 04/24/2020”, where the format always will be the same, but the amount will change. This is because we want to extract the amount and store it in a string, i.e., “2,000.00”.

Fig. 2.1: Entire workflow that extracts the amount from the input string

Assign Variables

First, assign the two variables in the variables manager:

  • ‘strInput’, a string where we store our input
  • ‘strOutut, a string to hold the value of the extracted amount

Use RegEx to assign a value to our output string

Drag in an ‘Assign’ activity (Fig. 2.2) and set the left side to ‘strInput’ and the right side to:

				
					System.Text.RegularExpressions.Regex.Match(strInput, "(?<=\$)(\d+,?)+(\.\d+)").Value
				
			
Fig. 2.2: Using an Assign for our RegEx

Lets dig into the expression on the right side:

  • System.Text.RegularExpressions is the Microsoft .NET namespace
  • Regex.Match is the method (covered here https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.match?view=netframework-4.8)
  • strInput is the string to search in
  • “(?<=\$)(\d+,?)+(.\d+)” is the RegEx pattern, we use to extract the amount
    • (?<=\$) is a positive lookbehind, where we specify, we want to look behind a ‘$’
    • (\d+,?) then we look for one or more digits (\d+) and one or none thousand delimiter
    • + one or more of the preceding patterns (otherwise it would stop after ‘2,’
    • (\.\d+) a decimal delimiter (\.) and one or more digits (\d+)

Write the output out

Now just drag in a ‘Write Line’ and set the value to our Output Variable, ‘strOutput’ (Fig. 2.3).

Fig. 2.3: Write out the output

And then we’re done (Fig. 2.4). The workflow will also work with all other values.

Fig. 2.4: Our extracted value

Additional RegEx material

Anders Jensen

RPA DEVELOPER, YOUTUBER & UIPATH MOST VALUED PROFESSIONAL 2021 Anders Jensen is the RPA Lead at Lessor A/S (Part of Paychex Inc) and an advanced certified UiPath RPA Instructor. Using his extensive experience in automating interfaces such as Windows, SAP, and browsers, Anders develops enterprise RPA solutions automating work for customers and colleagues one task at a time. In the evenings and weekends, Anders is passionate about teaching others RPA by making instructional videos on his YouTube channel.

Leave a Reply