How to Extract Data With RegEx in UiPath

Regular Expressions, or RegEx, is a sequence of characters that defines a search pattern. As a UiPath RPA Developer, we use RegEx to extract data out of, e.g., emails and PDFs.

Learning is necessary, as RPA is about extracting data from one system and placing them in another. But don’t worry, it’s straightforward.

Table of Contents

What is RegEx

RegEx (short for Regular Expressions) is a short sequence of code (characters) that defines a search pattern. That search pattern is used to search for specific data in, e.g., an email.

Fig. 1.1 shows an online RegEx tester. I’ve pasted the text (string) from an invoice. Just above, you’ll find the expression editor. The on-screen help is to the right and is very valuable when you’re learning RegEx.

Fig. 1.1: RegEx on a invoice

My RegEx pattern for this task is:

				
					(?<=Invoice Number ).*
				
			

This pattern extracts everything after “Invoice Number ” and gives me the invoice number: “2021-001” (marked with blue).

RegEx in UiPath

Regex Data Extraction in UiPath is effortless. The extraction can be done in the built-in activity (Matches) or directly with VB.NET code.

Using the Matches activity

Read your text into a variable of the type String. For example, here (Fig. 1.2), my data comes from the PDF called “Invoice”.

After you’ve dragged the “Matches” in from Activities, make sure it’s light blue (if not, click it).

Fig. 1.2: Using the Matches activity in UiPath

In Properties fill in the following:

Input:

Use your text String. Here I use the string coming out of the ‘Read PDF Text’ activity.

Pattern: 

Use a RegEx pattern. To get the invoice number, we can use:

 

				
					(?<=Invoice Number ).*
				
			

Result: With your mouse, click in Result (under Misc), and then press Ctrl+k; this will create a new variable of the type IEnumerable of Matches. Before you do anything else, give it a name. I’ll call mine ienMatches (but feel free to call it whatever you want). You can see a IEnumerable of something as a list with 0 or something in. Here it holds our Matches if there are any.

Drag in a Write Line and write your output variable from the Matches activity. Because it’s an IEnumerable, which is a list/collection, we specify 0 in parentheses afterward to get the first match the value of it. In a “real” robot, we’ll write our result to an Excel sheet, a database, or a system, but a Write Line is great for testing and learning purposes.

Try to Run your workflow, and in the Output, you’ll have your invoice number (see Fig. 1.3).

Fig. 1.3: Output of our RegEx/Matches workflow

Using an Assign

Using the build-in activities ‘Matches’ and ‘Is Match’ is often helpful. However, there is a much more straightforward solution (at least for many cases) to extract data from text with RegEx in UiPath.

Consider the following text: “Amount due $2,000.00, please pay before 04/24/2020”, where the format always will be the same, but the amount will change. This is because we want to extract and store the amount in a string, i.e., “2,000.00”.

Fig. 2.1: Entire workflow that extracts the amount from the input string

Assign Variables

First, assign the two variables in the variables manager:

  • ‘strInput’, a string where we store our input
  • ‘strOutut, a string to hold the value of the extracted amount

Use RegEx to assign a value to our output string

Drag in an ‘Assign’ activity (Fig. 2.2) and set the left side to ‘strInput’ and the right side to:

				
					System.Text.RegularExpressions.Regex.Match(strInput, "(?<=\$)(\d+,?)+(\.\d+)").Value
				
			
Fig. 2.2: Using an Assign for our RegEx

Lets dig into the expression on the right side:

  • System.Text.RegularExpressions is the Microsoft .NET namespace
  • Regex.Match is the method (covered here https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.match?view=netframework-4.8)
  • strInput is the string to search in
  • “(?<=\$)(\d+,?)+(.\d+)” is the RegEx pattern, we use to extract the amount
    • (?<=\$) is a positive lookbehind, where we specify, we want to look behind a ‘$’
    • (\d+,?) then we look for one or more digits (\d+) and one or none thousand delimiter
    • + one or more of the preceding patterns (otherwise, it would stop after ‘2,’
    • (\.\d+) a decimal delimiter (\.) and one or more digits (\d+)

Write the output out

Now just drag in a ‘Write Line’ and set the value to our Output Variable, ‘strOutput’ (Fig. 2.3).

Fig. 2.3: Write out the output

And then we’re done (Fig. 2.4). The workflow will also work with all other values.

Fig. 2.4: Our extracted value

Additional RegEx material

Questions?

If you have any UiPath (RegEx) questions, feel free to join my Discord server, where we (+1,600 RPA Developers and counting) are helping each other solve problems and network around our careers.

This Post Has 8 Comments

  1. Sonali

    hi i want to seprate s.no and title from column 1 in excel how to do it???

  2. Jyothi

    Hi Can you show me how to automate captcha using Regex

  3. Sharan

    I need to extract a paragraph from a mail which can have different context every time, but the sub-heading will be the same. e.x: Description – This is an example. When we use the regex command the output is stored as Ienumerable but I want to be either a string or an Integer. Can you help me on this

  4. Ranjit

    Hi,

    I want to make the string output like below, please suggest the best way.

    input=”abcdefgh”
    output I should get= ab c de f gh

    1. Anders Jensen

      Thanks for writing I’m getting more than 50 messages daily. While I read
      all of them, I can’t reply to everyone But I’ve created an RPA/Automation community where we’re 4900+ RPA Developers helping each other with solutions and our careers. Here’s the video on how to join (the invitation link is in the video description): https://youtu.be/xWFz-S96XGo Kind regards, Anders

  5. J Jacob

    How to extract a particular number from a string of characters
    e.g. 413-62416-WF | Bac ID: 4432672

    From the above string I want to extract the bac ID which is in this case 4432672, the string is always in the same format with “-“, “|” & “:”, but the lengths of each numbers can varry.

    1. Anders Jensen

      Hey J Jacob. Sorry about the late reply. Yes, you can use this RegEx:
      (?<=Bac ID: )\d+
      I hope it helped
      Kind regards, Anders

Leave a Reply