Bug #21161
closed
Crash when locale is set to Turkish tr_TR.UTF-8
Description
TL;DR this bug was reported in our tracker, and I'm pushing it upstream: https://bugzilla.opensuse.org/show_bug.cgi?id=1237861
When the locale is set to tr_TR.UTF-8
, there is an encoding error. It has been narrowed down specifically to setting LC_CTYPE
.
To reproduce simply run LC_CTYPE=tr_TR.UTF-8 ruby -e "puts 42"
Example from a fresh 3.4.2 install:
srbaker@geekopad:~> LC_CTYPE=tr_TR.UTF-8 ruby -e "puts 42"
/home/srbaker/.local/share/mise/installs/ruby/3.4.2/lib64/ruby/3.4.0/rubygems.rb:9:in 'Kernel#require': /home/srbaker/.local/share/mise/installs/ruby/3.4.2/lib64/ruby/3.4.0/x86_64-linux/rbconfig.rb:1: unknown or invalid encoding in the magic comment (ArgumentError)
> 1 | # encoding: ascii-8bit
| ^~~~~~~~~~
2 | # frozen-string-literal: false
3 | #
from /home/srbaker/.local/share/mise/installs/ruby/3.4.2/lib64/ruby/3.4.0/rubygems.rb:9:in '<top (required)>'
from <internal:gem_prelude>:2:in 'Kernel#require'
from <internal:gem_prelude>:2:in '<internal:gem_prelude>'
This reproduces across multiple installs of ruby: from our packages, locally built on both GNU/Linux and macOS.
It looks like it's related to some normalisation on lowercase i, which in Turkish appears to produce a lowercase i without a dot, and the string. Details in our bug linked above.
Updated by srbaker (Steven Baker) about 1 month ago
I have confirmed this does not affect any 3.x versions before 3.4.
Updated by nobu (Nobuyoshi Nakada) about 1 month ago
Does it happens with LC_CTYPE=tr_TR.UTF-8 ruby --parser=parse.y -e "puts 42"
too?
I think it is because pm_strncasecmp
is using system tolower
function.
We have to use our own locale-insensitive version such as st_locale_insensitive_strcasecmp
as stated in include/ruby/internal/ctype.h.
Updated by nobu (Nobuyoshi Nakada) about 1 month ago
- Assignee set to prism
Updated by srbaker (Steven Baker) about 1 month ago
Thanks for the quick response!
It does not happen with that command:
srbaker@geekopad:~> ruby -v
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
srbaker@geekopad:~> LC_CTYPE=tr_TR.UTF-8 ruby --parser=parse.y -e "puts 42"
42
Updated by byroot (Jean Boussier) 30 days ago
- Status changed from Open to Closed
Applied in changeset git|025832c3859c4369ed12ace13e35523bd04116fe.
[ruby/prism] Use a locale-insensitive version of tolower
[Bug #21161]
The tolower
function provided by the libc is locale dependent
and can behave in ways you wouldn't expect for some value
of LC_CTYPE
.
https://github.com/ruby/prism/commit/e3488256b4
Co-Authored-By: Nobuyoshi Nakada nobu@ruby-lang.org
Updated by byroot (Jean Boussier) 29 days ago
- Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED
Updated by ufuk (Ufuk Kayserilioglu) 26 days ago
- Backport changed from 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONE
ruby_3_4 3d744a0a9436fbf7901c345055dd3d775b518361.