Intro
Automatically do CVR (Danish VAT numbers) lookups online and scrape the web results back to Excel.
This complete tutorial teaches you all the best practices of a Power Automate RPA Developer regarding web and Excel automation.
Learning Objectives
- Advanced Web Automation
- Advanced Excel work
- DataTables
- Handling errors in your flows
- HTTP Requests/REST API and JSON parsing
Description
- Download the Excel file for the lesson
- Open the CVR (Danish VAT numbers) page: https://datacvr.virk.dk/
- We are doing advanced web data extraction. One step at a time:
- Read sheet ‘2. Companies 10 Data’
- Navigate to the CVR page
- Click the ‘English’ button in the upper right corner
- Make the flow robust by only clicking on the ‘English’ if it’s not clicked
- For each CVR, search the CVR page
- Scrape the results with an Extract data from web page
- Write the results back into the correct columns
- Close the browser
- Close the Excel book
- Navigate to the CVR page
- Instead of using the Extract data from web page, create a DataTable and add DataRows with each result to it
- This enables you to customize the extracted data
- Do dynamic file naming for the result Excel sheet (e.g. with a DateTime)
- Read sheet ‘3. Companies 12 Data’ and tune your flow to handle errors and create a log
- Read sheet ‘4. Companies 13 Data’ and do advanced UI Element work to make your flow handling when the web page data differs slightly
- Read sheet ‘5. Companies 10 Data’ and create HTTP Requests to make your flow fast and reliable
- API Documentation: https://cvrapi.dk
- Read sheet ‘2. Companies 10 Data’
Hints
- Create a variable for the path for the Excel book and use it in the Launch Excel (best practice, since the book can move and we’re using the path more than once).
- Use Subflows to structure your process
- A RowNumber integer variable is great for keeping track of row numbers in the Excel sheets
- Use a If web page contains action for handling errors
Video Guide
Additional Material
Code
CALL Initialization
CALL Process
CALL EndProcess
SET ProjectPath TO $'''C:\\Users\\Anders Jensen\\OneDrive - Anders Jensen\\Desktop\\Project'''
SET URL TO $'''https://datacvr.virk.dk/'''
SET ResultDataTable TO { ^['CVR', 'Name', 'Address', 'Postal code', 'City', 'Start date', 'Business type', 'Advertising protection', 'Status', 'Log'] }
DateTime.GetCurrentDateTime.Local DateTimeFormat: DateTime.DateTimeFormat.DateAndTime CurrentDateTime=> CurrentDateTime
Text.ConvertDateTimeToText.FromCustomDateTime DateTime: CurrentDateTime CustomFormat: $'''yyyyMMddHHmmss''' Result=> FormattedDateTime
Excel.LaunchExcel.LaunchAndOpenUnderExistingProcess Path: $'''%ProjectPath%\\Data.xlsx''' Visible: False ReadOnly: False Instance=> ExcelInstance
Excel.SetActiveWorksheet.ActivateWorksheetByName Instance: ExcelInstance Name: $'''6. Companies 10 Data'''
Excel.ReadFromExcel.ReadAllCells Instance: ExcelInstance ReadAsText: False FirstLineIsHeader: True RangeValue=> ExcelData
DISABLE WebAutomation.LaunchChrome.LaunchChrome Url: URL WindowState: WebAutomation.BrowserWindowState.Normal ClearCache: False ClearCookies: False WaitForPageToLoadTimeout: 60 Timeout: 60 BrowserInstance=> Browser
LOOP FOREACH CurrentItem IN ExcelData
Web.InvokeWebService.InvokeWebService Url: $'''https://cvrapi.dk/api?search=%CurrentItem['CVR']%&country=dk''' Method: Web.Method.Get Accept: $'''application/xml''' ContentType: $'''application/xml''' ConnectionTimeout: 30 FollowRedirection: True ClearCookies: False FailOnErrorStatus: False EncodeRequestBody: True UserAgent: $'''Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.21) Gecko/20100312 Firefox/3.6''' Encoding: Web.Encoding.AutoDetect AcceptUntrustedCertificates: False ResponseHeaders=> WebServiceResponseHeaders Response=> WebServiceResponse StatusCode=> StatusCode
Variables.ConvertJsonToCustomObject Json: WebServiceResponse CustomObject=> JsonAsCustomObject
SET ResultDataTable TO ResultDataTable + [CurrentItem['CVR'], JsonAsCustomObject['name'], JsonAsCustomObject['address'], JsonAsCustomObject['zipcode'], JsonAsCustomObject['city'], JsonAsCustomObject['startdate'], JsonAsCustomObject['companydesc'], JsonAsCustomObject['protected'], '', 'Succesfull']
END
Excel.WriteToExcel.WriteCell Instance: ExcelInstance Value: ResultDataTable Column: $'''A''' Row: 2
Excel.CloseExcel.CloseAndSaveAs Instance: ExcelInstance DocumentFormat: Excel.ExcelFormat.FromExtension DocumentPath: $'''%ProjectPath%\\%FormattedDateTime%_Data.xlsx'''
DISABLE WebAutomation.CloseWebBrowser BrowserInstance: Browser