Checking document presence on Google, Bing, Yahoo! and Yandex.

    Why is it important to ensure that your documents are appropriately indexed by search engines? How do you catch the URLs that aren't searchable? Let's say you have a list of a site's URLs and you need to make sure all of them were indexed by Google (or Bing, Yahoo! or maybe even Yandex). How would you do that? Of course, depending on the circumstances, there are several options.

    Simple use-case (1-10 URLs)

    Say your URL list is relatively small. One or maybe up to ten links. Not a big deal, you can do your checks manually! Go on Google and enter

    site:https://codecamp.com.au/camp-spark/

    And voila!

    Using Google's site: directive to check document indexationYou can immediately see whether your document was indexed appropriately and is searchable by your users. In this case we're good.

    Advanced use-case (LOTS of URLs!)

    The moment you need to check 100 or 1000 URLs this immediately becomes a non-trivial task. Checking hundreds (or even a couple of dozen) of URLs manually is a real pain in the ass, and using free scripts can only get you so far, because they get banned real quickly. That's exactly the pain I experienced a while ago, doing an SEO report for one of my clients. Which made me look for a better solution and I ended up writing a small bot myself.

    Algosaur to the rescue!

    The bot is called Indexation Checker and does exatly what you'd expect from it - it checks a list of URLs you feed to it against a list of search engines:
        1. Goolgle
        2. Bing
        3. Yahoo!
        4. Yandex

    You can submit up to 10 000 URLs and repeat that as many times as you wish:

    Indexation check demo script

    You need to keep in mind that

    www.site.com/page

    and

    www.site.com/page/

    Are two different URLs!

    Studying the results

    After you launch the script you need to give the API a bit of time. After the job successfully completes, you can view the end result:

    Indexation check script resultsTranslating this table is dead simple: 
    1 means that the exact URL was found in the index.
    0 means that the exact URL was not found in the index.
    -1 means that there was no check against this URL/search engine performed.

    As you can see from this example, one existing document was not indexed and I need to investigate why that happened, now manually. The reason appears to be quite surprising:

    Finding Google's indexation errorsApparently the site has a live development subdomain which is getting indexed, stealing all the juice from the main site. No good!

    Pricing

    Because I am using a paid API for these lookups, I had to put a price tag on each request. Fortunately, one URL call will only cost you as low as $0.004, which is extremely low. Checking indexation presence of 100 URLs manually or paying ¢40 to automate that? 🤔 To me, that's not even a question!

    Conclusion

    I encourage you to turn indexation checks into a habit and do it regularly, especially when creating SEO audits. Incorrect indexation shows a lot of problems which you can tackle right away. 

    You can try the script here, and if you're a new user you get free credits on sign-up

    Do you know of any other ways to quickly check document index presence? Please share in the comments!
     


    Questions? Thoughts? Leave a comment!
    Join the Algosaur family!
    Algosaur is a SAAS platform which provides access to scripts and algorithms for SEO and data analysis. Sign up now and get free credits