Bug #956
closedEncoding: nl_langinfo(CODESET) on cygwin 1.5 always returns US-ASCII
Description
=begin
It seems you cannot rely on nl_langinfo(CODESET) to return the proper charset on cygwin as it appears to always return
US-ASCII no matter what.
IMHO the configure script should not only check for the availability of langinfo but also for its functionality as it
seems to currently be a dummy function under cygwin.
Please see also http://groups.google.com/group/comp.lang.ruby/msg/42d92ae740d12a5f?hl=en
=end
Updated by duerst (Martin Dürst) almost 16 years ago
=begin
I can confirm that this problem happens. Adding a
#elif defined(CYGWIN)
option as the second choice in rb_locale_charmap in encoding.c should be a good start.
For the actual functionality, I think the best choice is
http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c
There is also http://www.haible.de/bruno/packages-libcharset.html,
but that's GNU, so it would create a copyright problem.
I guess the next steps would be to add the above langinfo.c to
the missing directory, probably changing the function name to
avoid conflicts with the existing (but useless) nl_langinfo.
I could easily do that, but I'd need some advice or help re.
makefiles. Nobu, Yui, anybody?
Regards, Martin.
=end
Updated by nobu (Nobuyoshi Nakada) almost 16 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
=begin
Applied in changeset r21311.
=end
Updated by duerst (Martin Dürst) almost 16 years ago
- Status changed from Closed to Open
=begin
The patch committed by Nobu uses the Windows 'locale' for cygwin,
which is a good idea as a fallback. However, I personally often
use cygwin with LANG=en-US.UTF-8 or so. Using putty (or another
UTF-8 capable terminal emulator such as TeraTerm,...) and cygwin
is often the only way to do UTF-8 work on Windows.
I'm not sure what Tom Link meant with "proper charset", but
for me, it would be UTF-8 if I have set LANG=en-US.UTF-8.
Regards, Martin.
=end
Updated by tomel (Tom Link) almost 16 years ago
=begin
proper charset
I'm fine with any solution that makes something 8-bit clean the default charset.
People using cygwin's x server though can run cygwin's utf-8-capable version of rxvt. In such a case, it could cause problems if ruby relied on the windows locale.
A proper solution should IMHO check for LANG first and use the windows locale only if LANG isn't defined -- as proposed by Martin.
Anyway, I haven't tried it yet but I guess the current solution is ok for me since I personally use the non-utf-8 windows rxvt terminal. Thanks.
=end
Updated by nobu (Nobuyoshi Nakada) almost 16 years ago
=begin
Hi,
At Sat, 10 Jan 2009 02:35:50 +0900,
Tom Link wrote in [ruby-core:21239]:
A proper solution should IMHO check for LANG first and use
the windows locale only if LANG isn't defined -- as proposed
by Martin.
It's working so.
--
Nobu Nakada
=end
Updated by duerst (Martin Dürst) almost 16 years ago
=begin
At 03:11 09/01/13, you wrote:
Hi,
At Sat, 10 Jan 2009 02:35:50 +0900,
Tom Link wrote in [ruby-core:21239]:A proper solution should IMHO check for LANG first and use
the windows locale only if LANG isn't defined -- as proposed
by Martin.It's working so.
That's not true. Currently, Encoding.default_external defaults
to US-ASCII if LANG is not set on cygwin, not to the windows
locale encoding.
We can leave it at that, or we can fix it.
Regards, Martin.
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
=end
Updated by nobu (Nobuyoshi Nakada) almost 16 years ago
=begin
Hi,
At Wed, 14 Jan 2009 18:11:36 +0900,
Martin Duerst wrote in [ruby-core:21341]:
At Sat, 10 Jan 2009 02:35:50 +0900,
Tom Link wrote in [ruby-core:21239]:A proper solution should IMHO check for LANG first and use
the windows locale only if LANG isn't defined -- as proposed
by Martin.It's working so.
That's not true. Currently, Encoding.default_external defaults
to US-ASCII if LANG is not set on cygwin, not to the windows
locale encoding.
Sorry, I'd missed to commit it.
--
Nobu Nakada
=end
Updated by duerst (Martin Dürst) almost 16 years ago
=begin
Hello Nobu,
Many thanks for fixing it. I'm going to add some text from
missing/langinfo.c to LICENSE (anybody, please tell me if
that was wrong), and inform the author about the changes we
made, and close the bug.
Regards, Martin.
At 11:03 09/01/15, Nobuyoshi Nakada wrote:
Hi,
At Wed, 14 Jan 2009 18:11:36 +0900,
Martin Duerst wrote in [ruby-core:21341]:At Sat, 10 Jan 2009 02:35:50 +0900,
Tom Link wrote in [ruby-core:21239]:A proper solution should IMHO check for LANG first and use
the windows locale only if LANG isn't defined -- as proposed
by Martin.It's working so.
That's not true. Currently, Encoding.default_external defaults
to US-ASCII if LANG is not set on cygwin, not to the windows
locale encoding.Sorry, I'd missed to commit it.
--
Nobu Nakada
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
=end
Updated by tomel (Tom Link) almost 16 years ago
=begin
It seems that the locale recognition doesn't work 100% or maybe I'm just doing it wrong.
On cygwin, the default external encoding is cp850. If I set LANG=de_DE.UTF-8, then
rube -e "Encoding.default_external"
=> UTF-8
gives the correct value. But if I set it to LANG=de_DE.ISO-8859-1, then
rube -e "Encoding.default_external"
=> CP850
returns the windows default locale. Since CP850 und ISO-8859-1 are incompatible encodings in the ruby mind-set, this is an unpleasant discovery.
=end