Intro to web scraping

Contents

What is scraping?

Web Scraping is a process to extract information from a source on the web. It can be data endpoints like json or csv files or parts of a webpage like tables or lists

Web scraping | Wikiwand

How does it work?

Web scraping works by mimicking a web browser and send requests to the server for information.

What is Web Scraping and How Does It Work

Data files that already exist as a URL

In such cases, when we don’t want to manually download the data (e.g. data that updates) via a browser, we can use code to send those download requests to the server and save the data locally.

Information that is on a webpage

When we don’t want to manually copy-paste tables of data on a page, we can use an HTML parser to read the page and then extract the parts that we need and convert it to a dataset. If the information is hidden behind some user interaction on the website, then we may need to recreate a browser environment to be able to get to the desired information.

What is scraping?

How does it work?

Should I scrape the web?