B2B Marketing Consulting Agency Blog

Demystify Yoast’s Advanced Crawl Optimization

- October 17, 2023 12:00 pm

Crawl optimization graphic

You might be familiar with Yoast, a popular and widely used WordPress plugin to fine-tune your SEO needs, but have you ever explored their Advanced Crawl Optimization settings?

These settings help prevent URLs and links with no SEO value from being crawled, reducing the electricity usage and carbon footprint of your website. WordPress automatically adds these URLs and links to the head section of your site pages. You may not be actively using this metadata and content for your website so Yoast gives you the option to remove them and improve your website’s crawl budget.

What is Crawl Budget?

For websites with many URLs and pages, crawl budget can become an issue.

Crawl budget is determined by a multitude of factors such as the:

  • Size of your website
  • “Health” of your website (does the crawler encounter any errors?)
  • Popularity of your website (number of site visits & links pointing to you)

The pages on a site with low crawl budget will take longer to be crawled by Google. However, this is only a problem for large websites with at least 10,000 unique pages that are being updated often. If this is the case, these settings help crawlers ignore the unnecessary pages and focus on crawling more important content.

To help you decide which settings will work best for your site, here is an overview:

Remove unwanted metadata

These settings add links and content to the site’s <head> and HTTP headers.

Remove shortlinks

Remove links to WordPress’ internal ‘shortlink’ URLs for your posts.

Remove REST API links

Remove links to the location of your site’s REST API endpoints.

  • The REST API is a developer-oriented feature that allows applications to retrieve data from or modify WordPress’ database. These applications could be plugins, so turning off this feature can break website functionality.

Remove RSD / WLW links

Remove links used by external systems for publishing content to your blog.

  • We recommend turning this setting on. Really Simple Discovery (RSD) allows external applications to communicate and modify WordPress (ex: Posting content through the WordPress mobile app). However, this functionality has been replaced by the REST API and can be a security risk. Windows Live Writer (WLW) is also a discontinued application.

Remove oEmbed links

Remove links used for embedding your content on other sites.

Remove generator tag

Remove information about the plugins and software used by your site.

Keeping this tag visible can be a security threat.

Pingback HTTP header

Remove links which allow others sites to ‘ping’ yours when they link to you.

  • Trackbacks are used to notify other blogs when you have linked to them. Pingbacks are an automatic version of trackbacks. If a link to an external site is made, the external site can choose to publish the trackback/pingback in their blog comment section. This setting is mainly useful for notifications when others have linked to you.

Remove powered by HTTP header

Remove information about the plugins and software used by your site.

Remove unused resources

WordPress loads lots of resources, some of which your site might not need. If you’re not using these, removing them can speed up your pages and save resources.

Remove emoji scripts

Remove JavaScript used for converting emoji characters in older browsers.

Remove WP-JSON API

Add a ‘disallow’ rule to your robots.txt file to prevent crawling of WordPress’ JSON API endpoints.

  • The REST API is a developer-oriented feature that allows applications to retrieve data from or modify WordPress’ database. These applications could be plugins, so Google needs to crawl the endpoints to properly render and index pages where plugins are used.

Now you might be wondering, can smaller websites still benefit from utilizing these settings? Since unused JavaScript links and other content are being removed from the <head> section, we hypothesized that a website may load faster, as minifying and removing unnecessary code is a proven method in improving performance. However, after testing these settings on our site and checking site speed through Google PageSpeed Insights, there was unfortunately no difference in scores.

Beside these settings, Yoast can also prevent RSS feed pages from being crawled. RSS feed pages provide metadata for website updates, such as new blogs or site comments. RSS or Really Simple Syndication allows for publishers to distribute content to subscribers who use feed readers to aggregate their news and information. Feeds are XML or Atom files with URLs that are meant for machines to read, providing information about the content including titles, summaries, and links.

There are benefits to using RSS feeds for website owners, but if you aren’t using these, Yoast can remove them as well and help improve crawl budget.

More Strategies for Improving the Performance of Your B2B Website

At Innovaxis, our digital marketing experts can help you with strategies for maximizing the speed and performance of your B2B website as part of a comprehensive marketing program.

Reach out to one of our team members today to get started.