Min menu

Pages

News

How to Create and Configure a Sitemap File?

create and configure a sitemap file

Creating a sitemap file speeds up the referencing of your website. Sitemaps can be used in an SEO strategy to better choose the pages you want to see indexed or not by search engines like Google. 'Science Geo - How to!' explains how to create this type of file, in XML or HTML, either by hand or by generating a file with a specific plug-in depending on the CMS you use.

What is a sitemap file?

A sitemap or "site map" in English is  an XML or HTML file  intended for search engines capable of interpreting the sitemap.org protocol (Google, Baidu, Yandex and Bing/Yahoo for the moment). It lists the URLs of a site, with the dates, the frequency of updates or the priority of one page over another. Please note that the sitemap should not be confused with the "site map" (which helps the Internet user to locate himself in a tree structure).

Related: Subdomain or Subdirectory: a strategic SEO choice

Sitemap file: what is it for?

From the sitemaps.org protocol, the sitemap file lists all the URLs of a site. When the robots (bot) of search engines arrive on a page, they begin to examine (crawl) each link they find to index all the pages. The sitemap file aims to simplify the work of crawl bots, by directly providing them with a list (index) of all the pages they must visit. It allows to specify to the bots which links the webmaster wishes to see indexed in priority.

As for the sitemap file, it can stand alone or be part of  the robots.txt file .

What information does the sitemap file contain?

In the form of a text file, it is simply a list of URLs  intended for Google's crawl bots, for example "https://www.yoursite.fr/example/examplepage1.html", “https://www.yoursite.fr/example/examplepage2.html”…

In XML, the sitemap file comes in the form of a listing formatted with a series of tags. Here are the 3 main ones:

  • URLSET attaches the file to the sitemap.org protocol,
  • URL for each individual URL entry,
  • LOC for each individual page.

To these 3 tags are added  3 parameters. There is LASTMOD which designates the date of the last change on the page, CHANGEFREQ which represents the update frequency and PRIORITY which corresponds to the degree (from 0.1 to 1.0) of priority of a page of the site compared to to another.

example google sitemap file
Example given by Google of a simple XML sitemap file.

The “lastmod” parameter

This optional tag can be used to indicate on the page when the content was last modified. From some Google SEO responses, it seems that the search engine intentionally ignores this setting. The developer blog clarifies that lastmod is ignored if it is  consistent with existing verifiable data  (eg if a date is displayed on the page).

The “priority” parameter

Even if the tag is present in the sitemap.org protocol, it turns out that Google deliberately ignores the “priority” value. You can set all world priorities in your sitemap, it will have no effect. Search engines like Google use their own methods to decide on the priority to give URLs to crawl. SEO experts agree that since 2020 it has been based on predictive methods, using AI, but nothing has been confirmed by Google.

The “changefreq” parameter

The “changefreq” tag is ignored by Google, even though it falls  within the standard sitemap.org protocol . It is therefore useless to define an update frequency for your URLs: Google will decide on its own.

Is a sitemap mandatory?

If your site is still small, this is not mandatory. For larger sites such as e-commerce that generate several pages per day or week, the sitemap is very useful to speed up the indexing of new URLs. Be careful, if you need a sitemap file for some of your pages to be indexed, it means that your site is not well designed. This can come from poor internal linking, either that there are not enough links, or that the tree structure of the pages is too “deep”. Crawl bots may not go beyond two sub-levels.

Creating a sitemap is not complex, but  long and tedious . This is even more true for sites whose content changes frequently, hence the interest in automating the procedure.

The sitemap will therefore be used in SEO for:

  • speed up the crawl, and therefore the indexing of new pages or the de-indexing of certain URLs,
  • index unlinked pages (orphan, landing page type),
  • facilitate the redesign of a site when there is a change of URL,
  • evaluate SEO performance according to the type of page,
  • better understand why Google accepts or refuses to index certain pages,
  • do an advanced analysis in order to find optimizations to put in place…

This allows you to work on the SEO of your site  or to facilitate a migration of the website.

What types of URLs to include in a sitemap file?

A sitemap file has a limit of 50,000 URLs max. It is possible to create several sitemaps and list them in a file index, itself logically limited to 50,000 sitemaps. On the other hand, it is not possible to list sitemap indexes directly. Knowing that you can submit up to 500 sitemap index files to Google, that's a maximum limit of 1,250,000,000,000 URLs, which is plenty for most website owners. Other criteria must be applied:

  • maximum weight : it must not exceed 50 MB once unzipped,
  • encoding : UTF-8,
  • Absolute URLs, displayed with their protocol: HTTP or HTTPS.

A sitemap is not used to declare a URL as canonical (the canonical tag  is in the HTML header of each page). On the other hand, the fact of listing URLs in a file makes Google understand that these URLs are important, and therefore “canonical”.

Sitemap and CMS: which method to choose?

You have the option of creating a sitemap by hand or through a sitemap generator. Both methods have their advantages and their constraints, especially when this sitemap is intended for a CMS website such as WordPress, PrestaShop or Joomla. In this case, most CMS have plug-ins that allow you to more or less easily generate a sitemap file.

Sitemap on Wordpress

To create a sitemap for WordPress, the easiest way is to install a plugin that will manage it for you such as Yoast SEO or XML Sitemaps, All in One SEO or Rank Math.

Sitemap on PrestaShop

PrestaShop offered by default a free module capable of creating sitemaps. It is now necessary to install an additional module such as Sitemap XML Pro for example (there are some paid and others free). Once the listings are created, the update task can be automated with a CRON if needed.

Sitemap on Joomla

As with PrestaShop, Joomla requires the installation of a plug-in or extension  to generate sitemaps, such as JSiteMap or EKS. Just go through the "extensions" tab of your site's dashboard, in which you will open a zipped extension file. The extensions are available for free or paid download in the “Joomla! Extensions Directory”.

Sitemap on Shopify

Shopify has a sitemap generator integrated into its default version (Basic), which is very practical. The file updates automatically as soon as you add a category, product sheet or blog post.

How to create a sitemap file?

Creating a sitemap file takes a little time, but if you have a lot of pages, it can be very useful to help you index your new URLs faster. Being  limited to 50MB , avoid listing too many images or videos if it's not crucial for you or splitting your file into multiple ones.

The sitemap file can be placed anywhere on your site, but its location does matter. If you place it in a subdirectory, the sitemap will only impact the “child” subfolders of that subdirectory. This is why Google recommends placing it at the highest level of the website hierarchy, i.e. at the root of the site.

Choose a name for your sitemap

You can give it any name you want, as long as it's encoded in UTF-8  and ends with the ".txt" extension. »

Be careful with automatically generated sitemaps: they often have the same name by default and are rarely renamed by webmasters. As a result, they are easily found by the competition. Avoid calling it “sitemap”. You don't necessarily want a competitor to know exactly which URLs are most important to you.

Determine the format of your sitemap

A sitemap file can be  in TXT, ATOM 1.0, RSS, mRSS or XML. The following cases cannot be handled with a simple TXT file:

  • Sitemap indexes .
  • if you want to provide information to mobiles,
  • if you need to manage several languages ​​(“  hreflang  ” attribute),
  • news sites,
  • image and video listings (the ones you want to see indexed, but beware this adds weight to the file).

In these cases, it will be necessary to use XML.

Create your sitemap file(s)

As a general rule, we avoid going through a sitemap generator. Why ? Because it will be necessary to recrawl the entire content of the site each time a URL is modified, added or deleted. Suffice to say that depending on the size of your site, this can be very long.

Plus, why use a sitemap generator that mimics Google's indexing process when the search engine is already crawling your site? It is therefore more optimized to go through a script to create, generate and update your own sitemap file. John Mueller, Webmaster Trends Analyst at Google, explains himself that it is better to automate the process from his own local database.

Creating a sitemap manually can be done with Windows Notepad, especially if you just list a few URLs (one absolute URL per line). In this case, be careful to respect the UTF-8 encoding, especially for the special characters contained in the URLs. A sitemap file can only contain ASCII characters, so you'll need to respect escape codes where appropriate.

Declare your sitemap file to Google

The sitemap file (or your sitemap index) must be submitted to Google. You can submit it through the  Google Search Console (sitemaps report tab) by simply uploading the document, or by adding its URL to your robots.txt file if you have one. Attention, the robots.txt file must necessarily be at the root of the site and bear exactly this name.

You can also PING by sending a GET request to Google's address, like in the image below from the developer blog:

ping google sitemap

If you use RSS or ATOM feeds, you can send your feed URL directly to Google via WebSub. Most development software has automatic RSS feed generation, but if you leave the default settings it may not update as well as manually.

The sitemap is a practical option to index its new pages more quickly, especially when you have a very large site whose content changes a lot. This mainly concerns news sites and e-commerce sites. Rather simple to create, it requires respecting a precise encoding and has some limits, particularly in terms of weight. On the other hand, it can become very tedious to manage on a daily basis. 'Science Geo - How to!' therefore recommends that you create an automation task so that you no longer have to worry about it.

The sitemap file does not direct Google robots. This is a list of recommendations, which he can follow or not according to his own criteria. The order of the URLs in the list does not matter: Google will explore them in the order it wants.

A sitemap is also an excellent tool for evaluating the performance of a website, finding orphan or zombie pages and detecting optimizations that will then allow you to improve your SEO. It will therefore not boost your SEO itself, but you can use it to be better referenced! 'Science Geo - How to!' has plenty of other tips for you to improve your site's performance.

Comments