How to Extract Data With RegEx in UiPath

Regular Expressions, or RegEx, is a sequence of characters that defines a search pattern. As a UiPath RPA Developer, we use RegEx to extract data out of, e.g., emails and PDFs.

Learning is necessary, as RPA is about extracting data from one system and placing them in another. But don’t worry, it’s straightforward.

Table of Contents

What is RegEx

RegEx (short for Regular Expressions) is a short sequence of code (characters) that defines a search pattern. That search pattern is used to search for specific data in, e.g., an email.

Fig. 1.1 shows an online RegEx tester. I’ve pasted the text (string) from an invoice. Just above, you’ll find the expression editor. The on-screen help is to the right and is very valuable when you’re learning RegEx.

Fig. 1.1: RegEx on a invoice

My RegEx pattern for this task is:

				
					(?<=Invoice Number ).*
				
			

This pattern extracts everything after “Invoice Number ” and gives me the invoice number: “2021-001” (marked with blue).

RegEx in UiPath

Regex Data Extraction in UiPath is effortless. The extraction can be done in the built-in activity (Matches) or directly with VB.NET code.

Using the Matches activity

Read your text into a variable of the type String. For example, here (Fig. 1.2), my data comes from the PDF called “Invoice”.

After you’ve dragged the “Matches” in from Activities, make sure it’s light blue (if not, click it).

Fig. 1.2: Using the Matches activity in UiPath

In Properties fill in the following:

Input:

Use your text String. Here I use the string coming out of the ‘Read PDF Text’ activity.

Pattern: 

Use a RegEx pattern. To get the invoice number, we can use:

 

				
					(?<=Invoice Number ).*
				
			

Result: With your mouse, click in Result (under Misc), and then press Ctrl+k; this will create a new variable of the type IEnumerable of Matches. Before you do anything else, give it a name. I’ll call mine ienMatches (but feel free to call it whatever you want). You can see a IEnumerable of something as a list with 0 or something in. Here it holds our Matches if there are any.

Drag in a Write Line and write your output variable from the Matches activity. Because it’s an IEnumerable, which is a list/collection, we specify 0 in parentheses afterward to get the first match the value of it. In a “real” robot, we’ll write our result to an Excel sheet, a database, or a system, but a Write Line is great for testing and learning purposes.

Try to Run your workflow, and in the Output, you’ll have your invoice number (see Fig. 1.3).

Fig. 1.3: Output of our RegEx/Matches workflow

Using an Assign

Using the build-in activities ‘Matches’ and ‘Is Match’ is often helpful. However, there is a much more straightforward solution (at least for many cases) to extract data from text with RegEx in UiPath.

Consider the following text: “Amount due $2,000.00, please pay before 04/24/2020”, where the format always will be the same, but the amount will change. This is because we want to extract and store the amount in a string, i.e., “2,000.00”.

Fig. 2.1: Entire workflow that extracts the amount from the input string

Assign Variables

First, assign the two variables in the variables manager:

  • ‘strInput’, a string where we store our input
  • ‘strOutut, a string to hold the value of the extracted amount

Use RegEx to assign a value to our output string

Drag in an ‘Assign’ activity (Fig. 2.2) and set the left side to ‘strInput’ and the right side to:

				
					System.Text.RegularExpressions.Regex.Match(strInput, "(?<=\$)(\d+,?)+(\.\d+)").Value
				
			
Fig. 2.2: Using an Assign for our RegEx

Lets dig into the expression on the right side:

  • System.Text.RegularExpressions is the Microsoft .NET namespace
  • Regex.Match is the method (covered here https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.match?view=netframework-4.8)
  • strInput is the string to search in
  • “(?<=\$)(\d+,?)+(.\d+)” is the RegEx pattern, we use to extract the amount
    • (?<=\$) is a positive lookbehind, where we specify, we want to look behind a ‘$’
    • (\d+,?) then we look for one or more digits (\d+) and one or none thousand delimiter
    • + one or more of the preceding patterns (otherwise, it would stop after ‘2,’
    • (\.\d+) a decimal delimiter (\.) and one or more digits (\d+)

Write the output out

Now just drag in a ‘Write Line’ and set the value to our Output Variable, ‘strOutput’ (Fig. 2.3).

Fig. 2.3: Write out the output

And then we’re done (Fig. 2.4). The workflow will also work with all other values.

Fig. 2.4: Our extracted value

Additional RegEx material

Questions?

If you have any UiPath (RegEx) questions, feel free to join my Discord server, where we (+1,600 RPA Developers and counting) are helping each other solve problems and network around our careers.

Anders Jensen

RPA DEVELOPER, YOUTUBER & TWO-TIME UIPATH MOST VALUED PROFESSIONAL (2021 & 2022) Anders has been running andersjensenorg fulltime since December 2021. The company specializes in teaching RPA via YouTube and tailormade learning paths. Anders is an experienced RPA-developer and -teacher with experience in both the public and private sectors. Anders’ YouTube channel has trained more than 100.000 Citizen Developers in just one video. Alongside his YouTube channel, Anders has built a unique global RPA-community with developers on all levels. When using Anders as your partner for RPA-trainings, you get direct access to this unique opportunity for problem-solving, opportunities, and networking.

This Post Has 3 Comments

  1. Sonali

    hi i want to seprate s.no and title from column 1 in excel how to do it???

  2. Jyothi

    Hi Can you show me how to automate captcha using Regex

Leave a Reply