This script visits indicated web pages and collects content from the specified HTML tags. It can also collect some other specific data which some people might find useful. It's really that simple. You can check up to 1 000 documents in one go.
Collects H1, H2, H3, H4 and H5 heading tags.
- Title tags
- Canonical tags
- Meta descriptions
- Index/Follow - returns true or false
Count the amount of images on each scanned page.
Return the amount of external and internal links detected on each page.
I personally use this algorithm when doing SEO to analyze my competition's meta-content strategy. The trick is to first get a list of URLs for a specific keyword (you can use this SEO algorithm to do just that), and then run these URLs through this script, collecting titles, metas and heading tags.