Intro

The Small Bookstore wants your help to scrape the Amazon book prices, so they better can compete.

What you will learn

  • Primarily:
    • Web scraping
  • Secondarily:
    • Dynamic web scraping
    • Excel
    • Regex

Client Name

The Small Bookstore

Process Name

WebScrapeAmazon

Description

The local book store wants to battle might Amazon, and you’re helping them automatically compare prices. So here we scrape Amazon book prices to use them in the comparison later.

  • Download the course material here
  • Read the Excel book called data
  • Set the active Excel worksheet to Topics
  • Do an Amazon search for each of the topics in the book section (pick ‘Books’ from the drop-down)
  • Scrape the results (Title, Author, and Price) into a new Excel sheet with the topic name as the sheet name
  • Close (and save) Excel
  • Advanced:
    • Create a loop that scrapes the first three pages of search search
    • Can we tune the Excel results? Hint: Regex

Hints

  • Launch Excel
    • Open the Data Excel book
  • Launch new Chome
  • Set active Excel worksheet
    • Topics
  • Get first free column/row from Excel worksheet
  • Read from Excel worksheet
  • For each
    • Go to web page
    • Add new worksheet
      • Name it %Topic%
    • Set drop-down list value on the web page
      • Books
    • Populate text field on web page
      • %Topic%
    • Click link on web page
    • Extract data from web page
    • Write to Excel worksheet
  • Close excel and save
  • Advanced:
    • Add headers
      • Before the For each
        • Set active Excel worksheet
          • Headers
        • Copy cells from Excel worksheet
          • We’re copying the headers
      • In  the start of the For each
        • Add new worksheet
          • Topic
        • Paste cells to Excel worksheet
    • First three pages
      • Loop (1 to 3)
        • Get first free column/row from Excel worksheet
        • Move the Extract data and the Write to Excel in the loop since we want to do it three times
        • Click link on web page
          • The next button
    • Regex
      •  Use Regex in the Extract Data to extract the author
 
				
					((?<=^by ).*(?= \|))|((?<=\| by ).*(?= \|))