Project

General

Profile

Actions

Bug #21161

closed

Crash when locale is set to Turkish tr_TR.UTF-8

Added by srbaker (Steven Baker) about 1 month ago. Updated 28 days ago.

Status:
Closed
Assignee:
Target version:
-
ruby -v:
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
[ruby-core:121193]

Description

TL;DR this bug was reported in our tracker, and I'm pushing it upstream: https://bugzilla.opensuse.org/show_bug.cgi?id=1237861

When the locale is set to tr_TR.UTF-8, there is an encoding error. It has been narrowed down specifically to setting LC_CTYPE.

To reproduce simply run LC_CTYPE=tr_TR.UTF-8 ruby -e "puts 42"

Example from a fresh 3.4.2 install:

srbaker@geekopad:~> LC_CTYPE=tr_TR.UTF-8 ruby -e "puts 42"
/home/srbaker/.local/share/mise/installs/ruby/3.4.2/lib64/ruby/3.4.0/rubygems.rb:9:in 'Kernel#require': /home/srbaker/.local/share/mise/installs/ruby/3.4.2/lib64/ruby/3.4.0/x86_64-linux/rbconfig.rb:1: unknown or invalid encoding in the magic comment (ArgumentError)
> 1 | # encoding: ascii-8bit
    |             ^~~~~~~~~~
  2 | # frozen-string-literal: false
  3 | #

	from /home/srbaker/.local/share/mise/installs/ruby/3.4.2/lib64/ruby/3.4.0/rubygems.rb:9:in '<top (required)>'
	from <internal:gem_prelude>:2:in 'Kernel#require'
	from <internal:gem_prelude>:2:in '<internal:gem_prelude>'

This reproduces across multiple installs of ruby: from our packages, locally built on both GNU/Linux and macOS.

It looks like it's related to some normalisation on lowercase i, which in Turkish appears to produce a lowercase i without a dot, and the string. Details in our bug linked above.

Updated by srbaker (Steven Baker) about 1 month ago

I have confirmed this does not affect any 3.x versions before 3.4.

Updated by nobu (Nobuyoshi Nakada) about 1 month ago

Does it happens with LC_CTYPE=tr_TR.UTF-8 ruby --parser=parse.y -e "puts 42" too?

I think it is because pm_strncasecmp is using system tolower function.
We have to use our own locale-insensitive version such as st_locale_insensitive_strcasecmp as stated in include/ruby/internal/ctype.h.

Updated by nobu (Nobuyoshi Nakada) about 1 month ago

  • Assignee set to prism

Updated by srbaker (Steven Baker) about 1 month ago

Thanks for the quick response!

It does not happen with that command:

srbaker@geekopad:~> ruby -v
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
srbaker@geekopad:~> LC_CTYPE=tr_TR.UTF-8 ruby --parser=parse.y -e "puts 42"
42
Actions #5

Updated by byroot (Jean Boussier) about 1 month ago

  • Status changed from Open to Closed

Applied in changeset git|025832c3859c4369ed12ace13e35523bd04116fe.


[ruby/prism] Use a locale-insensitive version of tolower

[Bug #21161]

The tolower function provided by the libc is locale dependent
and can behave in ways you wouldn't expect for some value
of LC_CTYPE.

https://github.com/ruby/prism/commit/e3488256b4

Co-Authored-By: Nobuyoshi Nakada

Actions #6

Updated by byroot (Jean Boussier) about 1 month ago

  • Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED

Updated by ufuk (Ufuk Kayserilioglu) 28 days ago

  • Backport changed from 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONE
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0