Episode #6 of the course Build your own web scraping tool by Hartley Brody
. . . that is, until I show them the one simple trick for getting the data they’re looking for right away.
Adding Content to the Page That Isn’t in the Initial HTML Response
This likely means the content you see in your browser was sent back from the server in some other, subsequent HTTP request, not the initial HTTP request to the URL you see at the top of your browser. Remember how we said some sites take hundreds of HTTP requests to load a given page? The data you’re seeing on the page might have come in the response to one of those many subsequent requests.
It should only take a few minutes to look through these requests until you find the one that returned the data you were hoping to scrape. You might even find that these sorts of requests are even easier to scrape! Rather than returning an HTML response, many AJAX endpoints will return the data in a format like JSON or XML, which are easy to parse using common tools.
Share with friends