Contents
Web Scraping is a process to extract information from a source on the web. It can be data endpoints like json
or csv
files or parts of a webpage like tables
or lists
Web scraping works by mimicking a web browser and send requests to the server for information.
What is Web Scraping and How Does It Work
Data files that already exist as a URL
In such cases, when we don’t want to manually download the data (e.g. data that updates) via a browser, we can use code to send those download requests to the server and save the data locally.
Information that is on a webpage
When we don’t want to manually copy-paste tables of data on a page, we can use an HTML parser to read the page and then extract the parts that we need and convert it to a dataset. If the information is hidden behind some user interaction on the website, then we may need to recreate a browser environment to be able to get to the desired information.