How to Block Part of a Web Page From Being Indexed by Search Engines?

Source: Unsplash

Did you know that SEO can drive your website’s conversion rates by up to 14.6%? Whereas, with paid search, you may only notice a conversion rate of around 2%.

In other words, if your website gets on top of the organic listings, your business is bound to make a profit.

This means that letting search engine bots get through every nook and cranny of your website should be your top priority, right?

Well, that’s not always the case. Any digital marketing company will tell you that sometimes, less is more. To put it differently, you’d be better off by blocking certain web pages or certain parts of a web page from the search engine crawlers.

Let’s see why.

Why block content from search engines? 

1. Duplicate content

Let’s say that you run an eCommerce store, and some of your products share the same product description. Or you might have two versions of the same web page: a printer-friendly one and one that is not printer-friendly.

At first, this may not seem like such a big deal, right? Well, Google considers this as duplicate content. Consequently, your web pages will take a hit in terms of their ranking.

In other words, there’s no good reason to try to index two web pages that share the same content. You should instead block one of these web pages from the search engine.

We’ve talked to a few experts at a New York web design company, and they said that besides the methods we’ll discuss later on, another way to go around this issue is by adding canonical tags.

In short, canonical tags help search engine crawlers determine which web page is the master copy and which one is the duplicate. Thus, only the master copy will be indexed.

2. Private web pages

If you happen to have a web page that displays personal information or confidential company data, the last thing you’d want is to drive organic traffic to it. That said, blocking this type of content from search engines might be a good idea.

But that’s not all you can do. Even though un-indexing a private web page will help keep unwanted traffic away, you may still experience unauthorized access.

So, you might want to take that extra step to make sure that your web page is secure. You could put an authentication system in place, for example.

3. Pages that offer no value to visitors

Here’s the thing: Google values user experience more than anything else. So, if you’re indexing web pages that do little to nothing in enhancing the browsing experience, you’ll likely notice a drop in rankings.

Things like “Thank you” pages, privacy policy pages, registration pages, or pages that are still under development should be blocked from search engines.

Now that we’ve seen why you should un-index certain web pages or specific parts of them, let’s take a look at how you can do so.

How to block web pages from search engines

1. The “noindex” meta tag

The “noindex” meta tag is one of the more popular methods. That’s because it’s simple and effective. This meta tag works much like the canonical tag we mentioned earlier, as it tells search engine crawlers what not to index.

So, what do you need to do to insert this meta tag?

First off, go to the <head section> of your page’s HTML markup. Then, insert the following code:

<meta name=’’robots’’ content=”noindex”>

The thing is that you’ll need to manually insert this code into every web page you’d like to un-index. To make the job a tad easier, consider using plug-ins, like Yoast SEO, for example.

Another thing you should note is that even if you un-indexed the web page, search engine crawlers will still be able to follow the links on the said page. To prevent this from happening, just add the “nofollow” tag next to the “noindex”.

Advertisements

In other words, the code will look like this:

<meta name=”robots” content=”noindex,nofollow”>

2. Use a robots.txt file 

Webmasters use the robots.txt file to instruct search engine bots on how to crawl pages on their site.

The way it works is that once the webmaster uploads the said file to their website, the crawlers will check it out to see what they need to index and what they should leave aside.

This file is typically used to prevent crawler traffic from overloading your website with requests.

But, you can also use this method to hide content away from the search engine, including an entire directory, a specific web page, and even a particular file or image.

So how can you block crawler traffic by using this method?

Upon creating a .txt file, you’ll need to add the following fields: “User-agent:” and “Disallow:”.

In the first field, you’ll have to specify the crawler types, whereas, in the second one, you’ll need to specify the content or page you want them to ignore.

So the code would look like this:

User-agent: Googlebot 

Disallow: /example-subfolder/blocked-page.html

In other words, this syntax tells Google’s crawlers not to crawl the page found on www.example.com/example-subfolder/blocked-page.html.

If you’d like to target two types of crawlers, like Googlebot and Bingbot, for example, you can create two “User-agent:” fields, each one being dedicated to a specific kind of crawler.

Let’s take a look at another example:

User-agent: * 

Disallow: /

This syntax tells all types of search engine bots, including Googlebot, Bingbot, etc., to ignore any web pages found on www.example.com.

After creating the file, you’ll need to place it at the root of your website. In our case, this file would be located at https://example.com/robots.txt.

Also, note that the file must be named exactly robots.txt. Otherwise, this method won’t work.

3. The X-Robots-Tag HTPP header

This method works much like the “noindex” meta tag we’ve discussed earlier. However, with the “X-Robots-Tag HTPP header”, you don’t have to manually insert the code within every single web page or make use of a plug-in.

The code would look like this:

X-Robots-Tag: noindex,nofollow

This is the equivalent of the meta tag example we showed earlier. But, this syntax will work for non-HTML content as well.

So, how can you make it work? Well, here’s the tricky part:

You’ll need to insert this tag within the HTTP header response for a specific URL. But, finding and editing it depends on your content management system and the webserver you use.

For example, if you’re using Apache, you can add this tag by accessing the .htacces file and by inserting the code we’ve shown above.

4. Google Search Console

Finally, if you happen to have a Google Search Console account, there’s no reason to go through all the trouble. With the Removals Tool it provides, you can submit a URL to be removed from the search engine results page.

But note that this method only allows you to remove a URL temporarily. More specifically, for around six months.

Suppose you’d like to have it deleted permanently, besides submitting the said URL.

In that case, you’ll either be required to update or remove the content on your website and also return a 404 or 410 HTTPS status code, block access to the content, or by using the “noindex” meta tag to specify that the page shouldn’t be indexed.

Final Words

Although it seems counter-intuitive, blocking specific web pages or certain parts of them from search engine crawlers will likely bring you positive results in terms of ranking.

You can use the methods we’ve shown to un-index pages that bring no real value to your visitor, like “Thank You” pages, for example.

Furthermore, un-indexing pages can be convenient when you’re trying to avoid the duplicate content penalty.
Author bio:

Tomas is a digital marketing specialist and a freelance blogger. His work is focusing on new web tech trends and digital voice distribution across different channels.

Digital Strategy One