Vote #64318
完了character encoding for attachment file
100%
説明
As r814, default encoding for repository can be configured.
diff or patch attachment requires similar configuration.
- default encoding for diff or patch attachment (Admin -> Settings -> Attachment -> diff/patch encodings ?).
- follow encoding of repository. (source:/trunk/app/helpers/repositories_helper.rb@1900#L109)
I thinks 2nd option may be enough and useful.
journals
youngseok yi wrote:
> * follow encoding of repository.
Attached patch implements it with minimal changes. attachment:attachment-encoding.patch
Proper solution will be something like:
# move @to_utf8@ to separate module, e.g. @RepoFilesHelper@
# make @AttachmentsHelper@ and @RepositoriesHelper@ @include RepoFilesHelper@
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Toshi, won't your last commit prevent me from attaching an iso8859-1 encoded patch to this issue and seeing it fine?
--------------------------------------------------------------------------------
Etienne Massip wrote:
> Toshi, won't your last commit prevent me from attaching an iso8859-1 encoded patch to this issue and seeing it fine?
This feature issue goal is that attachment *file* and *patch* encoding are converted by repositories setting.
!general-settings.png!
--------------------------------------------------------------------------------
I'm not sure this is a good idea; repositories may return data using a specific encoding, but attachments are usually stored on FS without transformation, so assuming that they're "very likely to be encoded the same way data in SCM is" is not necessarily true.
For example, my encoding list starts with UTF-8 and my locale (Fr) would assume that files uploaded by users are probably encoded in ISO-8859-15/CP1252; so assuming that the text files uploaded are in UTF-8 mean that they will be rendered stripped and that I will probably often loose some chars, which is the actual situation.
I would prefer to be able to specify a distinct default encoding for text attachments which would be ISO-8859-15/CP1252 (could be defaulted to default server encoding) and render with something like @bom_present?(str) ? str : Iconv.conv('UTF-8', Setting.default_encoding)@.
--------------------------------------------------------------------------------
UTF-8 is very strict.
It is very rare case that miss understanding ISO-8859-1 characters as UTF-8.
http://groups.google.com/group/thg-dev/browse_thread/thread/6c258628e3fce8/09e9dbe4a030e51d
--------------------------------------------------------------------------------
Redmine 1.2.2 repository converting encoding is this line.
source:tags/1.2.2/app/helpers/repositories_helper.rb#L140
In case of "UTF-8,ISO-8859-1",
if converting error in "UTF-8", Redmine converts from ISO-8859-1.
Japanese use three encoding, UTF-8, EUC-JP and Shift-JIS (CP932).
This Redmine feature is big advantage in Japan.
--------------------------------------------------------------------------------
So if I understand well, according to encoding list order, it will try and fail to convert the ISO-8859-1 file from UTF-8 to UTF-8 and then will try and success to convert it from ISO-8859-1 to UTF-8?
Guess it will work...
--------------------------------------------------------------------------------
What if the administrator does not set UTF-8 at the start of the list?
Can't you @str.is_utf8? ? str : try Iconv.conv('UTF-8', Setting.encodings)@?
--------------------------------------------------------------------------------
Etienne Massip wrote:
> repositories may return data using a specific encoding,
It is not true.
SCMs does not have encoding information (meta data) of *file contents*.
http://mercurial.selenic.com/wiki/EncodingStrategy?action=recall&rev=21#Unknown_byte_strings
--------------------------------------------------------------------------------
Toshi MARUYAMA wrote:
> It is not true.
> SCMs does not have encoding information (meta data) of *file contents*.
Well, that's why I said _may_ :-)
--------------------------------------------------------------------------------
Etienne Massip wrote:
> What if the administrator does not set UTF-8 at the start of the list?
This is very rare case in Japan.
It is popular "UTF-8,EUC-JP,Shift_JIS in Japan.
This order is strict order.
If "Single Byte Character Set":http://en.wikipedia.org/wiki/SBCS (e.g. ISO-8859-1) is the start of the list, all characters are converted to UTF-8.
But, I think this is very rare case in the whole world.
> Can't you @str.is_utf8? ? str : try Iconv.conv('UTF-8', Setting.encodings)@?
Default repository encoding setting is *empty*.
This is equivalent that default is UTF-8.
And I think it is better that administrator set UTF-8 in the start of the list explicitly.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Is this feature fixes #4608?
--------------------------------------------------------------------------------
Anton Statutov wrote:
> Is this feature fixes #4608?
I don't think so.
--------------------------------------------------------------------------------
Committed in r7885.
--------------------------------------------------------------------------------
related_issues
relates,Closed,9143,Partial diff comparison should be done on actual code, not on html
relates,Closed,4608,Mail attachment name encoding is incorectly handled
duplicates,Closed,4577,convert text file attached an issue to utf-8.
duplicates,Closed,3652,Unicode Support for TXT-Files