Save Web Data to Excel – Power Automate Desktop Advanced Use Case

Intro Automatically do CVR (Danish VAT numbers) lookups online and scrape the web results back to Excel. This complete tutorial teaches you all the best practices of a Power Automate RPA Developer regarding web and Excel automation. Learning Objectives Description Hints Video Guide Additional Material Code

Intro

Automatically do CVR (Danish VAT numbers) lookups online and scrape the web results back to Excel.

This complete tutorial teaches you all the best practices of a Power Automate RPA Developer regarding web and Excel automation.

Learning Objectives

  • Advanced Web Automation
  • Advanced Excel work
    • DataTables
  • Handling errors in your flows
  • HTTP Requests/REST API and JSON parsing

Description

  • Download the Excel file for the lesson
  • Open the CVR (Danish VAT numbers) page: https://datacvr.virk.dk/
  • We are doing advanced web data extraction. One step at a time:
    • Read sheet '2. Companies 10 Data'
      • Navigate to the CVR page
        • Click the 'English' button in the upper right corner
        • Make the flow robust by only clicking on the 'English' if it's not clicked
      • For each CVR, search the CVR page
      • Scrape the results with an Extract data from web page
      • Write the results back into the correct columns
      • Close the browser
      • Close the Excel book
    • Instead of using the Extract data from web page, create a DataTable and add DataRows with each result to it
      • This enables you to customize the extracted data
    • Do dynamic file naming for the result Excel sheet (e.g. with a DateTime)
    • Read sheet '3. Companies 12 Data' and tune your flow to handle errors and create a log
    • Read sheet '4. Companies 13 Data' and do advanced UI Element work to make your flow handling when the web page data differs slightly
    • Read sheet '5. Companies 10 Data' and create HTTP Requests to make your flow fast and reliable

Hints

  • Create a variable for the path for the Excel book and use it in the Launch Excel (best practice, since the book can move and we're using the path more than once).
  • Use Subflows to structure your process
  • A RowNumber integer variable is great for keeping track of row numbers in the Excel sheets
  • Use a If web page contains action for handling errors

Video Guide

https://youtu.be/7R5EjshDoQU

Additional Material

Code

CALL Initialization
CALL Process
CALL EndProcess
SET ProjectPath TO $'''C:\\Users\\Anders Jensen\\OneDrive - Anders Jensen\\Desktop\\Project'''
SET URL TO $'''https://datacvr.virk.dk/'''
SET ResultDataTable TO { ^['CVR', 'Name', 'Address', 'Postal code', 'City', 'Start date', 'Business type', 'Advertising protection', 'Status', 'Log'] }
DateTime.GetCurrentDateTime.Local DateTimeFormat: DateTime.DateTimeFormat.DateAndTime CurrentDateTime=> CurrentDateTime
Text.ConvertDateTimeToText.FromCustomDateTime DateTime: CurrentDateTime CustomFormat: $'''yyyyMMddHHmmss''' Result=> FormattedDateTime
Excel.LaunchExcel.LaunchAndOpenUnderExistingProcess Path: $'''%ProjectPath%\\Data.xlsx''' Visible: False ReadOnly: False Instance=> ExcelInstance
Excel.SetActiveWorksheet.ActivateWorksheetByName Instance: ExcelInstance Name: $'''6. Companies 10 Data'''
Excel.ReadFromExcel.ReadAllCells Instance: ExcelInstance ReadAsText: False FirstLineIsHeader: True RangeValue=> ExcelData
DISABLE WebAutomation.LaunchChrome.LaunchChrome Url: URL WindowState: WebAutomation.BrowserWindowState.Normal ClearCache: False ClearCookies: False WaitForPageToLoadTimeout: 60 Timeout: 60 BrowserInstance=> Browser
LOOP FOREACH CurrentItem IN ExcelData
    Web.InvokeWebService.InvokeWebService Url: $'''https://cvrapi.dk/api?search=%CurrentItem['CVR']%&country=dk''' Method: Web.Method.Get Accept: $'''application/xml''' ContentType: $'''application/xml''' ConnectionTimeout: 30 FollowRedirection: True ClearCookies: False FailOnErrorStatus: False EncodeRequestBody: True UserAgent: $'''Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.21) Gecko/20100312 Firefox/3.6''' Encoding: Web.Encoding.AutoDetect AcceptUntrustedCertificates: False ResponseHeaders=> WebServiceResponseHeaders Response=> WebServiceResponse StatusCode=> StatusCode
    Variables.ConvertJsonToCustomObject Json: WebServiceResponse CustomObject=> JsonAsCustomObject
    SET ResultDataTable TO ResultDataTable + [CurrentItem['CVR'], JsonAsCustomObject['name'], JsonAsCustomObject['address'], JsonAsCustomObject['zipcode'], JsonAsCustomObject['city'], JsonAsCustomObject['startdate'], JsonAsCustomObject['companydesc'], JsonAsCustomObject['protected'], '', 'Succesfull']
END
Excel.WriteToExcel.WriteCell Instance: ExcelInstance Value: ResultDataTable Column: $'''A''' Row: 2
Excel.CloseExcel.CloseAndSaveAs Instance: ExcelInstance DocumentFormat: Excel.ExcelFormat.FromExtension DocumentPath: $'''%ProjectPath%\\%FormattedDateTime%_Data.xlsx'''
DISABLE WebAutomation.CloseWebBrowser BrowserInstance: Browser

Write a comment