Commissioning of a completed Bitrix-based site is a small loss. As a rule, the most interesting begins when Google and Yandex web crawlers first index the site, when a search engine results page includes information, unnecessary for users: from “technical trash” to that photo from a New Year's office party.
Hold on, the unknown SEO-specialist, hold on, the unfortunate programmer, after all, the only thing to make was the correct robots.txt for Bitrix.
For reference: robots.txt is a file located in a website’s root, which limits access to web crawlers of its certain sections and pages.
The favourite phrase of the beginning copywriters “every project is unique” is the most suitable for our situation. The exception is only standard directives for robots.txt: User-agent, Disallow, Host and Sitemap. As you like it, it is a mandatory minimum.
The remaining issues of disallowing and overlapping are up to you. Despite the fact that Bitrix is an off-the-shelf product, the directives of the projects based on it can differ significantly from each other. The thing is in the structure and functionality of a separately taken site.
Imagine that you have a Bitrix-based corporate site with a standard set of sections: “About”, “Services”, “Projects”, “Contacts”, “News”. If the content of this site is unique, it is necessary to work on the disallowance of technical part of the project.
1. Disallow indexing for the folders /bitrix and /cgi-bin. It’s purely technical information (CSS, templates, captchas), which no one needs but GoogleBot, swearing in the panel of webmasters. You can safely disallow it. The algorithm of actions is the following: Disallow: / example/
2. The folder /search is also interesting neither to searchers, nor to users. By disallowing it, you secure yourself against doubling pages, repeating tags and titles in the results page.
3. When making robots.txt for Bitrix, sometimes they forget to disallow forms of authorization and PHP-authentication on the site. It is about
4. If your site is capable of printout – whether it be a district map or the invoice for payment – don't forget to disallow the following directories in the file robots.txt:
5. Bitrix carefully stores all history of your site: successful registration of users, records about successful change and recovery of passwords. However, the interest for web crawlers is doubtful.
6. Imagine that you look through a photo album on the site, you open one, two, three photos, but on the fourth you decide to return. In the address bar there will be something like such curse: ?back_url_ = %2Fbitrix%2F%2F. It can be also disallowed by changing of the file robots.txt in the root of the 1C-Bitrix CMS.
Thus we insure the open (visible to users) and the disallowed parts (visible to the Bitrix CMS administrators).
7. Folder /upload. There Bitrix stores pictures and videos from the site. If the content is unique, it isn't necessary to disallow the folder. After all, the indexed pictures and videos is an additional source of traffic data. Another matter when /upload contains confidential information or not unique content.
The same base as for corporate sites, though there’re some corrections.
1. Unlike a simple corporate site, online store has, as a rule, more than one hundred pages. The pages of pagination, responsible for transition of the user from one product page to another, clog search engines. The more pages, the more “trash”.
2. Disallow indexing of actions of users and site administrators. Traces of filtering, comparing of products, adding of products in a basket also shall be hidden from the view of a web crawler.
3. Finally, UTM-marks. Deny access to them by proceeding as follows: