What Are Crawlability and Indexability: Issues & Checklist

In order to show up in Google’s search engine results pages (SERPs), your website needs to be both crawlable and indexable.

Crawlability refers to how easy it is for Google’s bots to access your site’s content. Think this is like a scan. This is how easy for Google to scan your website and content. To index web pages, Google uses a program called “Google Spider”.

Spiders start crawling the web by fetching the few top websites in the world. Then they follow the links of those websites to find other websites. Google has multiple spiders/programs to fetch the web now.

 Indexability has to do with whether or not your site’s content can be indexed by Google. This means the scanned content or website can be saved in Goole servers. This is similar to saving a file on your computer.

In simple English, Google crawling and indexing is scanning your website and saving a copy in Google servers.

When you are searching on Google, you are actually searching Google’s index of the web. (Saved version of your web page by Google). Google delivers search results within milliseconds due to this reason.

Google Crawlability and Indexability Process

Now if your website is not crawlable, Google will not be able to access your content; if Google can’t scan the website, Google can’t save a file ( index ) on their server. This means you are not visible in search results.

Before ranking in Google, first, you need to get indexed.

How to Check for Crawlability and Indexability Issues

There are a few ways to check if your website is crawlable and indexable.

Use the “site:” operator in Google.

To check for indexability, you can use the “site:” operator in Google. For example, if you type “site:example.com” into Google, you will see a list of all the pages from example.com that are indexed by Google. If there are no results, then your website is not indexable.

If your website is indexed, you can see how many pages Goole indexed as well.

How-to-check-website-indexed-in-Google

Google Search Console ( GSC)

You can also go for a more in-depth analysis of crawlability & indexability using Google Search Console. This tool allows you to see how Google crawls and indexes your website. To use the Google Search Console, you must first create a Google account.

Once you have a Google account, you can add your website to the Google Search Console. There are different ways to verify your website in a GSC account. The recommended method is to verify domain vis DNS records.

After adding your website, under the Indexing > Coverage you will be able to see how many pages Google indexed and how many pages Google excluded. By going into each section you can further find details of any issues.

Google Search Console Indexing Coverage Report

Use a website crawler such as Screaming Frog

To use Screaming Frog, you first need to download the software. After downloading and installing the software, open it and click “Crawl > Start Crawl.” Enter your website’s URL and click “Start.”

Screaming Frog will then crawl your website and give you a report of any errors it found. The main section you need to look for is “Indexability”. You can use this software for many other in-depth technical investigations.

How-to-use-Screaming-Frog-To-Check-for-Google-Index-issues

Why can’t you find your website/web page on Google?

One common reason a website might not be able to be found on Google is that it’s not indexed yet. This means that Google was unable to scan the website and save a copy (index) on their server yet. Without being indexed, your website will not show up in search results.

Let’s look at the issues that can affect crawlability and indexability.

Having “noIndex” tags on your website

If you have “noIndex” tags on your website, it can impact its indexability on Google and other search engines.

You can find this quickly by looking at the website source code. Navigate to the page you want to check and use the keyboard shortcuts CTRL+U in Windows and Option+Command+U in Mac. Alternatively, you can right-click anywhere on the page and select “View Page Source” as well.

noindex-tag

With the “noIndex” tag, we instruct Google and other search engines not to save a copy of my website.

When developing a new website, web developers add this code to the website so Google will not crawl and index the uncompleted websites.

The most common issue we noticed within the industry is that web developers forget to remove “noIndex” tag from the website when they launch the website live.

Remove this tag if it’s available on your website/web page if you want to get indexed on Google.

On the other than if you want to know how to block Google with tags please check here.

Having a robots.txt file that blocks Google’s spiders

As we explained before, before Google index your website, Google needs to crawl (scan) your website. If Google is not allowed to do this, Google will not crawl the website. Which means indexing.

What is a robots.txt file?

“A robots.txt file tells search engine crawlers which URLs the crawler can access on your site.” Source: Google

You can view your website robots.text file by replacing your domain to below.

https://exampledomain.co/robots.txt
Robots.txt file

Make sure you have not disallowed any Google bots or other search engines.

Google hasn’t crawled your website/web page yet

Another simple reason can be if it’s a new website or a web page Google still hasn’t crawled your website or a web page.

If it’s a web page that has been there for a long time and you do not have any of the issues above this can be due to crawler budget issues or canonical issues on the site. To investigate further, recommend using Google Seach Console live URL testing tool.

oogle Seach Console live URL testing tool

Once you enter your URL you will be able to see whether the URL is indexed in Google or not which is preventing Indexing.

You can simply “Request Indexing” for specific URLs from this URL Inspection tool as well.

Google-Search-Console-URL-Inspection-Result

How to increase the indexability of your website

Creating your Google Search Console ( GSC) account and submitting your website is the first step in monitoring your website indexing issues and performance in Google.

Apart from the basic issues highlighted above, Improving and monitoring the below factors can increase the crawling and indexability of a website.

Simplify Your Website URL Structure

Google has a limited crawler budget for each website. This Google prioritise the pages they want to crawl and index. If your web page is too deep within the URL, there is less chance that Google will crawl your website.

An example of too deep page structure.

example.com/category/subcategory/subcategory/products/productname

As you can see Google has to complete crawling all four categories before crawling the final product page. Most of the time Google’s crawler budget will run out by the time Google reach the product page leaving that page not found.

This is important especially if your website is a large website with many different products. To be able to crawl and index efficiently you need to have a flat URL structure.

An example of falt URL structure

example.com/products/productname

or

example.com/productname

Improve Internal Links

How do you think Google will prioritise pages to crawl, after crawling the home page?

Internal links are the best signal for Google to know what pages are important on a website. That is why it is important for your to link to primary pages from your home page or footer and always link to internal pages, where necessary.

Minimise Broken Links & Redirect Loops

Broken links can be a factor to run out of your crawler budget. Instead of crawling important pages on your website, Google will spend time crawling your broken links and error pages. This is why it’s a common recommendation in SEO to fix broken links.

If there is a redirect loop on your website, Google crawlers will keep crawling the same set of URLs again and again, impacting your crawler budget.

Monitor Server Errors

If there is an issue with your web server or downtime that can also impact the crawlability and indexability. A most common issue is that you will run out of your monthly bandwidth quota and the server will give a 509 error. For more learning, you can refer to a list of server response codes here

Minimise Website coding/scripting issues

Some common examples of crawlability issues include using certain scripts that block crawlers, Flash-based navigation, and gating content behind a form (without also making it available elsewhere on the site).

If a website uses a lot of Javascript, Ajax or Iframes, sometimes it may be difficult for web crawlers to access the content.

Additionally, heavy code and unnecessary codes of a website can slow down the crawling progress.

Submit a Sitemap

Sitemaps can be HTML or XML sitemaps. The easiest way to submit your sitemap in the GSC account and it will be easy for Google to find your web pages.

Continue publishing content and avoid duplicate content

it’s important to avoid duplicate content. Duplicate content can result in decreased crawlability, which means that web crawlers will visit your site less often.

In addition to avoiding duplicated content, you should also regularly update and add new content to your site. Web crawlers are more likely to visit sites that are constantly updating their content.

Improve your website loading speed

Improving website loading speed means that Google can quickly crawl your website. If a section or code of a website takes time to load, that can lead to crawler budget waste.

Google’s search engine results pages (SERPs) may seem like magic, but when you look more closely, you see that sites show up in the search results because of crawling and indexing.

As SEO professionals our job is to help Google to do its job easily. As noted here, rankings come second, you need to index your website in search engines first.

For more information visit How Google Search organizes information

You May Also Like