Bug #18955: Kernel#sprintf - %c ignores a non-ASCII character's encoding - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #18955

closed

Kernel#sprintf - %c ignores a non-ASCII character's encoding

Bug #18955: Kernel#sprintf - %c ignores a non-ASCII character's encoding

Added by andrykonchin (Andrew Konchin) over 3 years ago. Updated over 3 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

3.0.3

Backport:

2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN

[ruby-core:109428]

Description

I haven't found any similar existing issue so decided to create a new one.

I noticed that sprintf("%c", string) doesn't handle (in an expected way) a case when encodings of format sequence and string argument aren't the same and the string argument contains non-ASCII character.

In this case it seems to me that sprintf just uses binary representation of a character and assigns (or interprets with) encoding of the format sequence string.

I would expect that sprintf negotiates encoding and converts everything (the character and the format string) to the chosen one. And raises error when negotiation fails.

Examples to illustrate this behavior:

format = "%c".encode("Windows-1251")
string = "Й".encode(Encoding::KOI8_U)
r = sprintf(format, string)
r.encoding
# => #<Encoding:Windows-1251>

r == "Й".encode("Windows-1251")
# => false

r.codepoints
# => [234]
string.codepoints
# => [234]

In this example the result's encoding is a format's encoding. But codepoint isn't changed and equals a codepoint of the character in the original string's encoding. But it should be different:

"Й".encode("Windows-1251").codepoints
# => [201]

Another example:

string = "À".encode(Encoding::CP1252)
sprintf("%c", string)
# => in `sprintf': invalid byte sequence in UTF-8 (ArgumentError)

In this example the error means that sprintf doesn't encode properly a codepoint (of string's encoding) in UTF-8. It uses just raw bytes.

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#1

Description updated (diff)

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#2

Subject changed from Kernel#sprintf - %c doesn't convert non-ASCII characters to Kernel#sprintf - %c ignores a non-ASCII character's encoding

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#3

Description updated (diff)

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#4

Description updated (diff)

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#5

Description updated (diff)

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#6

Description updated (diff)

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#7

Description updated (diff)

Updated by nobu (Nobuyoshi Nakada) over 3 years ago Actions
Copy link
#8 [ruby-core:109489]

A codepoint is expected for %c, then the former examples are currently expected behaviors, I think.

The latter example is a bug.

Updated by mame (Yusuke Endoh) over 3 years ago Actions
Copy link
#9 [ruby-core:109544]

At the dev-meeting, @akr (Akira Tanaka) proposed that the format %c behaves like %s (with the one-codepoint restriction) and @matz (Yukihiro Matsumoto) agreed with it.

Updated by nobu (Nobuyoshi Nakada) over 3 years ago Actions
Copy link
#10

Status changed from Open to Closed

Applied in changeset git|ce384ef5a95b809f248e089c1608e60753dabe45.

[Bug #18955] Check length of argument for %c in proper encoding

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #18955

Kernel#sprintf - %c ignores a non-ASCII character's encoding

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#1

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#2

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#3

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#4

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#5

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#6

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#7

Updated by nobu (Nobuyoshi Nakada) over 3 years ago Actions
Copy link
#8 [ruby-core:109489]

Updated by mame (Yusuke Endoh) over 3 years ago Actions
Copy link
#9 [ruby-core:109544]

Updated by nobu (Nobuyoshi Nakada) over 3 years ago Actions
Copy link
#10

Project

General

Profile

Ruby

Tags

Custom queries

Bug #18955

Kernel#sprintf - %c ignores a non-ASCII character's encoding

Updated by andrykonchin (Andrew Konchin) over 3 years ago ActionsCopy link #1

Updated by andrykonchin (Andrew Konchin) over 3 years ago ActionsCopy link #2

Updated by andrykonchin (Andrew Konchin) over 3 years ago ActionsCopy link #3

Updated by andrykonchin (Andrew Konchin) over 3 years ago ActionsCopy link #4

Updated by andrykonchin (Andrew Konchin) over 3 years ago ActionsCopy link #5

Updated by andrykonchin (Andrew Konchin) over 3 years ago ActionsCopy link #6

Updated by andrykonchin (Andrew Konchin) over 3 years ago ActionsCopy link #7

Updated by nobu (Nobuyoshi Nakada) over 3 years ago ActionsCopy link #8 [ruby-core:109489]

Updated by mame (Yusuke Endoh) over 3 years ago ActionsCopy link #9 [ruby-core:109544]

Updated by nobu (Nobuyoshi Nakada) over 3 years ago ActionsCopy link #10

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#1

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#2

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#3

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#4

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#5

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#6

Updated by andrykonchin (Andrew Konchin) over 3 years ago Actions
Copy link
#7

Updated by nobu (Nobuyoshi Nakada) over 3 years ago Actions
Copy link
#8 [ruby-core:109489]

Updated by mame (Yusuke Endoh) over 3 years ago Actions
Copy link
#9 [ruby-core:109544]

Updated by nobu (Nobuyoshi Nakada) over 3 years ago Actions
Copy link
#10