In a previous post, we have explained the basics of this topic: What is web scraping, and how would you program a software that performs web scraping?
Alas, programming is a special skill that needs some time and effort to be mastered. We have also introduced in another post the declarative web scraping language OXPath, which can help non-programmers to get a web scraper up and running in less time.
In Smart Harvesting II, we had asked ourselves: What kind of tool would a librarian need to be able to extract bibliographic metadata from the Web? In the beginning, we focused on OXPath, but we soon realized, that even though this declarative language is easier to read and write than a script in a full-blown programming language, there are still some hurdles involved that render OXPath not the best alternative for our user group.
In addition, we realized that, in the meantime, there are a good deal of web scraping tools suitable for the layman available.
In this post, we want to give an overview on the - in our view - most promising web scraping tools.