Vote #79033: RailsBaseURI ignored while creating robots.txt - redmineorg-copy202205 - Redmine

編集操作

リンクをコピー

Vote #79033

完了

RailsBaseURI ignored while creating robots.txt

Admin Redmine さんが3年以上前に追加. 3年以上前に更新.

ステータス:

Closed

優先度:

通常

担当者:

カテゴリ:

SEO_48

対象バージョン:

4.0.0_99

開始日:

2022/05/09

期日:

進捗率:

予定工数:

Redmineorg_URL:

https://www.redmine.org/issues/27865

category_id:

version_id:

issue_org_id:

27865

author_id:

122206

assigned_to_id:

comments:

status_id:

tracker_id:

plus1:

affected_version:

closed_on:

affected_version_id:

133

ステータス-->[Closed]

引用

説明

In my case I use
RailsBaseURI /redmine
which results to URLs like /redmine/issues but in robots.txt are URLs without the prefix /redmine:

User-agent: *
Disallow: /projects/support/repository
Disallow: /projects/support/issues
Disallow: /projects/support/activity
Disallow: /issues/gantt
Disallow: /issues/calendar
Disallow: /activity
Disallow: /search

The expected robots.txt must be

User-agent: *
Disallow: /redmine/projects/support/repository
Disallow: /redmine/projects/support/issues
Disallow: /redmine/projects/support/activity
Disallow: /redmine/issues/gantt
Disallow: /redmine/issues/calendar
Disallow: /redmine/activity
Disallow: /redmine/search

As a feature request, it would be nice to have some additional URLs which I could add in configuration.

Something like:

User-agent: *
<% @projects.each do |p| -%>
Disallow: /projects/<%= p.to_param %>/repository
Disallow: /projects/<%= p.to_param %>/issues
Disallow: /projects/<%= p.to_param %>/activity
<% @config.robots_project.each do |u| -%>
Disallow: /projects/<%= p.to_param %>/<%= u.to_param %>
<% end -%>
<% end -%>
Disallow: /issues/gantt
Disallow: /issues/calendar
Disallow: /activity
Disallow: /search
<% @config.robots_main.each do |u| -%>
Disallow: /<%= u.to_param %>
<% end -%>

journals

Try this patch.
--------------------------------------------------------------------------------
I tested your patch and it works like expected.
It's clever to use the existing functions instead of assembling some string.
Thanks.

I got:

<pre>
User-agent: *
Disallow: /redmine/projects/support/repository
Disallow: /redmine/projects/support/issues
Disallow: /redmine/projects/support/activity
Disallow: /redmine/issues/gantt
Disallow: /redmine/issues/calendar
Disallow: /redmine/activity
Disallow: /redmine/search
</pre>

--------------------------------------------------------------------------------
https://webmasters.stackexchange.com/questions/89395/robots-txt-should-be-in-the-root-directory-or-can-be-in-sub-directory

> web crawlers will not read or obey a robots.txt file in a subdirectory.
> http://www.robotstxt.org/robotstxt.html
--------------------------------------------------------------------------------
Sure?

> RedirectMatch permanent ^/robots.txt$ /redmine/robots.txt

--------------------------------------------------------------------------------
I would like to post my apache.log which show that all bots redirect properly, but I've been blocked always because of spam.

--------------------------------------------------------------------------------
<pre>
redmine.:443 88.198.55.175 - - [29/Dec/2017:08:10:23 +0000] "GET /robots.txt HTTP/1.1" 301 3962 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8;)"
redmine.:443 88.198.55.175 - - [29/Dec/2017:08:10:23 +0000] "GET /redmine/robots.txt HTTP/1.1" 200 4126 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8;)"
redmine.:443 66.249.76.118 - - [29/Dec/2017:09:48:48 +0000] "GET /robots.txt HTTP/1.1" 301 3962 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;)"
redmine.:443 66.249.76.118 - - [29/Dec/2017:09:48:48 +0000] "GET /redmine/robots.txt HTTP/1.1" 200 841 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;)"
redmine.:443 5.255.251.125 - - [29/Dec/2017:10:51:21 +0000] "GET /robots.txt HTTP/1.1" 301 3967 "-" "Mozilla/5.0 (compatible; YandexBot/3.0;)"
redmine.:443 87.250.233.120 - - [29/Dec/2017:10:51:21 +0000] "GET /redmine/robots.txt HTTP/1.1" 200 4187 "-" "Mozilla/5.0 (compatible; YandexBot/3.0;)"
</pre>

--------------------------------------------------------------------------------
Some more:

<pre>
redmine.:443 54.36.150.157 - - [28/Dec/2017:05:03:01 +0000] "GET /robots.txt HTTP/1.1" 301 3711 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.2;)"
redmine.:443 54.36.150.157 - - [28/Dec/2017:05:03:02 +0000] "GET /redmine/robots.txt HTTP/1.1" 200 919 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.2;)"
redmine.:443 157.55.39.25 - - [29/Dec/2017:01:35:11 +0000] "GET /robots.txt HTTP/1.1" 301 3942 "-" "Mozilla/5.0 (compatible; bingbot/2.0;)"
redmine.:443 157.55.39.25 - - [29/Dec/2017:01:35:12 +0000] "GET /redmine/robots.txt HTTP/1.1" 200 4012 "-" "Mozilla/5.0 (compatible; bingbot/2.0;)"
redmine.:443 194.187.170.123 - - [21/Dec/2017:22:26:10 +0000] "GET /robots.txt HTTP/1.0" 301 3825 "-" "Mozilla/5.0 (compatible; Qwantify/2.4w;)/2.4w"
redmine.:443 194.187.170.123 - - [21/Dec/2017:22:26:10 +0000] "GET /redmine/robots.txt HTTP/1.0" 200 1072 "-" "Mozilla/5.0 (compatible; Qwantify/2.4w;)/2.4w"
redmine.:443 18.195.89.56 - - [26/Dec/2017:00:37:01 +0000] "GET /robots.txt HTTP/1.1" 301 3711 "-" "Mozilla/5.0 (compatible; Cliqzbot/2.0;)"
redmine.:443 18.195.89.56 - - [26/Dec/2017:00:37:04 +0000] "GET /redmine/robots.txt HTTP/1.1" 200 782 "-" "Mozilla/5.0 (compatible; Cliqzbot/2.0;)"
</pre>
--------------------------------------------------------------------------------
Grischa Zengel wrote:
> Sure?
>
> > RedirectMatch permanent ^/robots.txt$ /redmine/robots.txt

How will you do if your web site has plural subdirectories?

E.g.:
* example.com/redmine1
* example.com/redmine2
--------------------------------------------------------------------------------
For better compatibility I will change to:

$ cat /etc/cron.hourly/robots

#!/bin/sh
wget https://redmine/redmine/robots.txt -O /var/www/html/robots.txt

--------------------------------------------------------------------------------
> How will you do if your web site has plural subdirectories?

Than you have to use the cron solution and concatenate the results.
But how many servers will host more than one redmine instance? 1%?

--------------------------------------------------------------------------------
Grischa Zengel wrote:
> > How will you do if your web site has plural subdirectories?
>
> Than you have to use the cron solution and concatenate the results.

<pre>
$ wget http://localhost:3100/test1/robots.txt -O - > ~/Desktop/robots.txt
$ wget http://localhost:3100/test2/robots.txt -O - | grep '^Disallow:' >> ~/Desktop/robots.txt
</pre>

--------------------------------------------------------------------------------
It doesn't matter how you post process the robots.txt from subdirectories, redmine has to generate a valid robots.txt and your patch works.
So please add it to next sub release, so I don't have to remember to patch manually. It doesn't break anything.

--------------------------------------------------------------------------------
I have committed in r17135.

Grischa Zengel wrote:
> So please add it to next sub release

I don't want to change behaviour in sub version.

--------------------------------------------------------------------------------

操作

リンクをコピー

Admin Redmine さんが3年以上前に更新

カテゴリ を SEO_48 にセット
対象バージョン を 4.0.0_99 にセット

編集操作

リンクをコピー

他の形式にエクスポート: Atom PDF

プロジェクト

全般

プロフィール

redmineorg-copy202205

カスタムクエリ

Vote #79033

RailsBaseURI ignored while creating robots.txt

Admin Redmine さんが3年以上前に更新

編集

unofficial-redmine.org デモサイト¶

Redmine.org上のチケットplus1数を集計し、検索用に公開しています。( 2022/5 Redmine5.0 リリース記念版)¶