There are numerous times when we need to embed a piece of information into a WordPress post such that it always stays updated. This could be stock quotes, oil prices, currency rates, match schedules, team winning odds and so on. For this we either simply link to an external URL or probably embed it as an iframe.
But both these approaches have their own pitfalls. Linking to an external URL would increase your bounce rate and an iframe won’t always be the right approach for your user experience. Also, both these approaches do not add any generic SEO value to your page as the content is never really on your page.
That’s when WP Web Scraper enters the stage. WP Web Scraper is a free plugin lets you grab content from any web page or XML or RSS feed and display it on your WordPress website. Not only this, but it goes way beyond by allowing you to specify content right down to the CSS selector and also providing you with host of advanced options to parse, modify and format the content.
WP Web Scraper lets you specify a URL source and a query to fetch specific content from it. WP Web Scraper can be used through a shortcode (for posts, pages or sidebar) or template tag (for direct integration in your theme) for scraping and displaying web content. Here’s an actual example:
As a shortcode:
[wpws url=”https://www.yahoo.com/” query=”ol.trendingnow_trend-list” output=”text”]
As a template tag:
<?php echo wpws_get_content(“https://www.yahoo.com/”, “ol.trendingnow_trend-list”, array(‘output’ => ‘text’)); ?>
The above shortcode and template tag would output the content of the CSS Selector ‘ol.trendingnow_trend-list’ from URL ‘https://www.yahoo.com/’ in your post, page or sidebar as plain text (HTML striped).
In case of template tag (wpws_get_content), the first argument is URL, the second argument is query while the third argument is a associative array with all other arguments.
You may browse through more such examples to understand this better.
WP Web Scraper has a host of options to control your URL request, do advanced parsing and managing output. Apart from CSS Selectors, WP Web Scraper also supports XPath and Regex queries.
- Scrap output can be displayed thru custom template tag, shortcode in page, post and sidebar (through a text widget).
- Configurable caching of scraped data. Cache timeout in minutes can be defined in minutes for every scrap.
- Configurable Useragent for your scraper can be set for every scrap.
- Configurable default settings like enabling, useragent, timeout, caching, error handling.
- Multiple ways to query content – CSS Selector, XPath or Regex.
- A wide range of arguments for parsing content.
- Option to pass post arguments to a URL to be scraped.
- Dynamic conversion of scrap to specified character encoding to scrap data from a site using different charset.
- Create scrap pages on the fly using dynamic generation of URLs to scrap or post arguments based on your page’s get or post arguments.
- Callback function for advanced parsing of scraped data.
The plugin is well documented with a lot of examples and FAQs. And in case of more advanced parsing requirements and assistance in crafting a perfectly optimized web scrape, you can always use the paid support.
The plugin uses native WordPress APIs wherever possible. It uses HTTP API for making HTTP requests and Transients API for caching and does not rely on any third party fetching or parsing services.
There is also a bit of inline help in the plugin Settings page (Settings > WP Web Scraper) to make sure everything is clear. The settings page also comes with a simple testing sandbox tool to test out shortcodes before they are embedded into posts.