Project

General

Profile

Bug #956

Encoding: nl_langinfo(CODESET) on cygwin 1.5 always returns US-ASCII

Added by tomel (Tom Link) over 11 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
ruby -v:
Backport:
[ruby-core:20994]

Description

=begin
It seems you cannot rely on nl_langinfo(CODESET) to return the proper charset on cygwin as it appears to always return
US-ASCII no matter what.

IMHO the configure script should not only check for the availability of langinfo but also for its functionality as it
seems to currently be a dummy function under cygwin.

Please see also http://groups.google.com/group/comp.lang.ruby/msg/42d92ae740d12a5f?hl=en
=end

#1

Updated by duerst (Martin Dürst) over 11 years ago

=begin
I can confirm that this problem happens. Adding a

#elif defined(CYGWIN)

option as the second choice in rb_locale_charmap in encoding.c should be a good start.
For the actual functionality, I think the best choice is
http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c
There is also http://www.haible.de/bruno/packages-libcharset.html,
but that's GNU, so it would create a copyright problem.

I guess the next steps would be to add the above langinfo.c to
the missing directory, probably changing the function name to
avoid conflicts with the existing (but useless) nl_langinfo.

I could easily do that, but I'd need some advice or help re.
makefiles. Nobu, Yui, anybody?

Regards, Martin.
=end

#2

Updated by yugui (Yuki Sonoda) over 11 years ago

  • Target version set to 1.9.1 RC2

=begin

=end

#3

Updated by nobu (Nobuyoshi Nakada) over 11 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

=begin
Applied in changeset r21311.
=end

#4

Updated by duerst (Martin Dürst) over 11 years ago

  • Status changed from Closed to Open

=begin
The patch committed by Nobu uses the Windows 'locale' for cygwin,
which is a good idea as a fallback. However, I personally often
use cygwin with LANG=en-US.UTF-8 or so. Using putty (or another
UTF-8 capable terminal emulator such as TeraTerm,...) and cygwin
is often the only way to do UTF-8 work on Windows.

I'm not sure what Tom Link meant with "proper charset", but
for me, it would be UTF-8 if I have set LANG=en-US.UTF-8.

Regards, Martin.
=end

#5

Updated by tomel (Tom Link) over 11 years ago

=begin

proper charset

I'm fine with any solution that makes something 8-bit clean the default charset.

People using cygwin's x server though can run cygwin's utf-8-capable version of rxvt. In such a case, it could cause problems if ruby relied on the windows locale.

A proper solution should IMHO check for LANG first and use the windows locale only if LANG isn't defined -- as proposed by Martin.

Anyway, I haven't tried it yet but I guess the current solution is ok for me since I personally use the non-utf-8 windows rxvt terminal. Thanks.

=end

#6

Updated by nobu (Nobuyoshi Nakada) over 11 years ago

=begin
Hi,

At Sat, 10 Jan 2009 02:35:50 +0900,
Tom Link wrote in [ruby-core:21239]:

A proper solution should IMHO check for LANG first and use
the windows locale only if LANG isn't defined -- as proposed
by Martin.

It's working so.

--
Nobu Nakada

=end

#7

Updated by duerst (Martin Dürst) over 11 years ago

=begin
At 03:11 09/01/13, you wrote:

Hi,

At Sat, 10 Jan 2009 02:35:50 +0900,
Tom Link wrote in [ruby-core:21239]:

A proper solution should IMHO check for LANG first and use
the windows locale only if LANG isn't defined -- as proposed
by Martin.

It's working so.

That's not true. Currently, Encoding.default_external defaults
to US-ASCII if LANG is not set on cygwin, not to the windows
locale encoding.

We can leave it at that, or we can fix it.

Regards, Martin.

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp

=end

#8

Updated by nobu (Nobuyoshi Nakada) over 11 years ago

=begin
Hi,

At Wed, 14 Jan 2009 18:11:36 +0900,
Martin Duerst wrote in [ruby-core:21341]:

At Sat, 10 Jan 2009 02:35:50 +0900,
Tom Link wrote in [ruby-core:21239]:

A proper solution should IMHO check for LANG first and use
the windows locale only if LANG isn't defined -- as proposed
by Martin.

It's working so.

That's not true. Currently, Encoding.default_external defaults
to US-ASCII if LANG is not set on cygwin, not to the windows
locale encoding.

Sorry, I'd missed to commit it.

--
Nobu Nakada

=end

#9

Updated by duerst (Martin Dürst) over 11 years ago

=begin
Hello Nobu,

Many thanks for fixing it. I'm going to add some text from
missing/langinfo.c to LICENSE (anybody, please tell me if
that was wrong), and inform the author about the changes we
made, and close the bug.

Regards, Martin.

At 11:03 09/01/15, Nobuyoshi Nakada wrote:

Hi,

At Wed, 14 Jan 2009 18:11:36 +0900,
Martin Duerst wrote in [ruby-core:21341]:

At Sat, 10 Jan 2009 02:35:50 +0900,
Tom Link wrote in [ruby-core:21239]:

A proper solution should IMHO check for LANG first and use
the windows locale only if LANG isn't defined -- as proposed
by Martin.

It's working so.

That's not true. Currently, Encoding.default_external defaults
to US-ASCII if LANG is not set on cygwin, not to the windows
locale encoding.

Sorry, I'd missed to commit it.

--
Nobu Nakada

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp

=end

#10

Updated by duerst (Martin Dürst) over 11 years ago

  • Status changed from Open to Closed

=begin

=end

#11

Updated by tomel (Tom Link) over 11 years ago

=begin
It seems that the locale recognition doesn't work 100% or maybe I'm just doing it wrong.

On cygwin, the default external encoding is cp850. If I set LANG=de_DE.UTF-8, then

rube -e "Encoding.default_external"
=> UTF-8

gives the correct value. But if I set it to LANG=de_DE.ISO-8859-1, then

rube -e "Encoding.default_external"
=> CP850

returns the windows default locale. Since CP850 und ISO-8859-1 are incompatible encodings in the ruby mind-set, this is an unpleasant discovery.

=end

Also available in: Atom PDF