In order to show up in Google’s search engine results pages (SERPs), your website needs to be both crawlable and indexable.
Crawlability refers to how easy it is for Google’s bots to access your site’s content. Think this is like a scan. This is how easy for Google to scan your website and content. To index web pages, Google uses a program called “Google Spider”.
Spiders start crawling the web by fetching the few top websites in the world. Then they follow the links of those websites to find other websites. Google has multiple spiders/programs to fetch the web now.
Indexability has to do with whether or not your site’s content can be indexed by Google. This means the scanned content or website can be saved in Goole servers. This is similar to saving a file on your computer.
In simple English, Google crawling and indexing is scanning your website and saving a copy in Google servers.
When you are searching on Google, you are actually searching Google’s index of the web. (Saved version of your web page by Google). Google delivers search results within milliseconds due to this reason.
Now if your website is not crawlable, Google will not be able to access your content; if Google can’t scan the website, Google can’t save a file ( index ) on their server. This means you are not visible in search results.
Before ranking in Google, first, you need to get indexed.
How to Check for Crawlability and Indexability Issues
There are a few ways to check if your website is crawlable and indexable. In summary they are as below;
Crawlability & Indexability Checklist
- Use The “Site:” Operator In Google To Check Index Status.
- Check Your Index Status On Google Search Console Account
- Use Web Crawler To Find Indexing & Crawling Issues
- Check Robots.Txt File That Blocks Google’s Spiders
- Check “Noindex” Tags On Your Website
- Simplify Your Website URL Structure
- Improve Internal Links
- Minimise Broken Links & Redirect Loops
- Monitor Server Errors
- Minimise Website Coding/Scripting Issues
- Submit A Sitemap
- Continue Publishing Content And Avoid Duplicate Content
- Improve Your Website Loading Speed
Use the “site:” operator in Google to check Index Status.
To check for indexability, you can use the “site:” operator in Google. For example, if you type “site:example.com” into Google, you will see a list of all the pages from example.com that are indexed by Google. If there are no results, then your website is not indexable.
If your website is indexed, you can see how many pages Goole indexed as well.
Google Search Console ( GSC)
You can also go for a more in-depth analysis of crawlability & indexability using Google Search Console. This tool allows you to see how Google crawls and indexes your website. To use the Google Search Console, you must first create a Google account.
Once you have a Google account, you can add your website to the Google Search Console. There are different ways to verify your website in a GSC account. The recommended method is to verify domain vis DNS records.
After adding your website, under the Indexing > Coverage you will be able to see how many pages Google indexed and how many pages Google excluded. By going into each section you can further find details of any issues.
Use a website crawler such as Screaming Frog
To use Screaming Frog, you first need to download the software. After downloading and installing the software, open it and click “Crawl > Start Crawl.” Enter your website’s URL and click “Start.”
Screaming Frog will then crawl your website and give you a report of any errors it found. The main section you need to look for is “Indexability”. You can use this software for many other in-depth technical investigations.
Why can’t you find your website/web page on Google?
One common reason a website might not be able to be found on Google is that it’s not indexed yet. This means that Google was unable to scan the website and save a copy (index) on their server yet. Without being indexed, your website will not show up in search results.
Let’s look at the issues that can affect crawlability and indexability.
Having “noIndex” tags on your website
If you have “noIndex” tags on your website, it can impact its indexability on Google and other search engines.
You can find this quickly by looking at the website source code. Navigate to the page you want to check and use the keyboard shortcuts CTRL+U in Windows and Option+Command+U in Mac. Alternatively, you can right-click anywhere on the page and select “View Page Source” as well.
With the “noIndex” tag, we instruct Google and other search engines not to save a copy of my website.
When developing a new website, web developers add this code to the website so Google will not crawl and index the uncompleted websites.
The most common issue we noticed within the industry is that web developers forget to remove “noIndex” tag from the website when they launch the website live.
Remove this tag if it’s available on your website/web page if you want to get indexed on Google.
On the other than if you want to know how to block Google with tags please check here.
Having a robots.txt file that blocks Google’s spiders
As we explained before, before Google index your website, Google needs to crawl (scan) your website. If Google is not allowed to do this, Google will not crawl the website. Which means indexing.
What is a robots.txt file?
“A robots.txt file tells search engine crawlers which URLs the crawler can access on your site.” Source: Google
You can view your website robots.text file by replacing your domain to below.
Make sure you have not disallowed any Google bots or other search engines.
Google hasn’t crawled your website/web page yet
Another simple reason can be if it’s a new website or a web page Google still hasn’t crawled your website or a web page.
If it’s a web page that has been there for a long time and you do not have any of the issues above this can be due to crawler budget issues or canonical issues on the site. To investigate further, recommend using Google Seach Console live URL testing tool.
Once you enter your URL you will be able to see whether the URL is indexed in Google or not which is preventing Indexing.
You can simply “Request Indexing” for specific URLs from this URL Inspection tool as well.
How to increase the indexability of your website
Creating your Google Search Console ( GSC) account and submitting your website is the first step in monitoring your website indexing issues and performance in Google.
Apart from the basic issues highlighted above, Improving and monitoring the below factors can increase the crawling and indexability of a website.
Simplify Your Website URL Structure
Google has a limited crawler budget for each website. This Google prioritise the pages they want to crawl and index. If your web page is too deep within the URL, there is less chance that Google will crawl your website.
An example of too deep page structure.
As you can see Google has to complete crawling all four categories before crawling the final product page. Most of the time Google’s crawler budget will run out by the time Google reach the product page leaving that page not found.
This is important especially if your website is a large website with many different products. To be able to crawl and index efficiently you need to have a flat URL structure.
An example of falt URL structure
Improve Internal Links
How do you think Google will prioritise pages to crawl, after crawling the home page?
Internal links are the best signal for Google to know what pages are important on a website. That is why it is important for your to link to primary pages from your home page or footer and always link to internal pages, where necessary.
Minimise Broken Links & Redirect Loops
Broken links can be a factor to run out of your crawler budget. Instead of crawling important pages on your website, Google will spend time crawling your broken links and error pages. This is why it’s a common recommendation in SEO to fix broken links.
If there is a redirect loop on your website, Google crawlers will keep crawling the same set of URLs again and again, impacting your crawler budget.
Monitor Server Errors
If there is an issue with your web server or downtime that can also impact the crawlability and indexability. A most common issue is that you will run out of your monthly bandwidth quota and the server will give a 509 error. For more learning, you can refer to a list of server response codes here
Minimise Website coding/scripting issues
Some common examples of crawlability issues include using certain scripts that block crawlers, Flash-based navigation, and gating content behind a form (without also making it available elsewhere on the site).
Additionally, heavy code and unnecessary codes of a website can slow down the crawling progress.
Submit a Sitemap
Sitemaps can be HTML or XML sitemaps. The easiest way to submit your sitemap in the GSC account and it will be easy for Google to find your web pages.
Continue publishing content and avoid duplicate content
it’s important to avoid duplicate content. Duplicate content can result in decreased crawlability, which means that web crawlers will visit your site less often.
In addition to avoiding duplicated content, you should also regularly update and add new content to your site. Web crawlers are more likely to visit sites that are constantly updating their content.
Improve your website loading speed
Improving website loading speed means that Google can quickly crawl your website. If a section or code of a website takes time to load, that can lead to crawler budget waste.
Google’s search engine results pages (SERPs) may seem like magic, but when you look more closely, you see that sites show up in the search results because of crawling and indexing.
As SEO professionals our job is to help Google to do its job easily. As noted here, rankings come second, you need to index your website in search engines first.
For more information visit How Google Search organizes information