Feature #16006: String count and alignment that consider multibyte characters - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #16006

closed

String count and alignment that consider multibyte characters

Added by sawa (Tsuyoshi Sawada) about 6 years ago. Updated about 6 years ago.

Status:

Rejected

Assignee:

Target version:

[ruby-core:<unknown>]

Description

In non-proportional font, multibyte characters have twice the width of ASCII characters. Since String#length, String#ljust, String#rjust, and String#center do not take this into consideration, applying these methods do not give the desired output.

array = ["aaあああ", "bいいいいいいいい", "cc"]

col_width = array.max(&:length)
array.each{|w| puts w.ljust(col_width, "*")}

# >> aaあああ****
# >> bいいいいいいいい
# >> cc*******

In order to do justification of strings that have multi-byte characters, we have to do something much more complicated such as the following:

col_widths =
  array.to_h{|w| [
    w,
    w
    .chars
    .partition(&:ascii_only?)
    .then{|ascii, non| ascii.length + (non.length * 2)}
  ]}
col_width = col_widths.values.max
array.each{|w| puts w + "*" * (col_width - col_widths[w])}

#  Note that the following gives the desired alignment in non-proportional font, but may not appear so in this issue tracker.
# >> aaあああ*********
# >> bいいいいいいいい
# >> cc***************

This issue seems to be common, as several webpages can be found that attempt to do something similar.

I propose to give the relevant methods an option to take multibyte characters into consideration. Perhaps something like the proportional keyword in the following may work:

"aaあああ".length(proportional: true) # => 8
"aaあああ".ljust(17, "*", proportional: true) # => "aaあああ*********"

Then, the desired output would be given by this code:

col_width = array.max{|w| w.length(proportional: true)}
array.each{|w| puts w.ljust(col_width, "*", proportional: true)}

# >> aaあああ*********
# >> bいいいいいいいい
# >> cc***************

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

Description updated (diff)

Actions

Copy link

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

Description updated (diff)

Actions

Copy link

#3 [ruby-core:93804]

Updated by shyouhei (Shyouhei Urabe) about 6 years ago

This particular proposal is NG. ASCII vs. non-ASCII is too Asian-centric. There are other non-wide non-ASCII encodings, such as those in Europe.

Actions

Copy link

Updated by shyouhei (Shyouhei Urabe) about 6 years ago

Is duplicate of Feature #14618: Add display width method to String for CLI added

Actions

Copy link

#5 [ruby-core:93805]

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

shyouhei (Shyouhei Urabe) wrote:

This particular proposal is NG. ASCII vs. non-ASCII is too Asian-centric. There are other non-wide non-ASCII encodings, such as those in Europe.

Yeah, the keyword name non_ascii In my original proposal was not good. It would make things complicated, and was too specific, as @shyouhei (Shyouhei Urabe) has addressed.

I updated my proposal to have the keyword proportional. I expect all the width to be handled automatically including non-wide non-ASCII letters.

Actions

Copy link

#6 [ruby-core:93806]

Updated by shyouhei (Shyouhei Urabe) about 6 years ago

sawa (Tsuyoshi Sawada) wrote:

shyouhei (Shyouhei Urabe) wrote:

This particular proposal is NG. ASCII vs. non-ASCII is too Asian-centric. There are other non-wide non-ASCII encodings, such as those in Europe.

Yeah, the keyword name non_ascii In my original proposal was not good. It would make things complicated, and was too specific, as @shyouhei (Shyouhei Urabe) has addressed.

I updated my proposal to have the keyword proportional. I expect all the width to be handled automatically including non-wide non-ASCII letters.

Still not appropriate. There are characters whose "wide"-ness is not fixed until they actually got rendered. See also: https://unicode.org/reports/tr11/ especially the section named "Modern Rendering Practice".

Actions

Copy link

#7 [ruby-core:93809]

Updated by matz (Yukihiro Matsumoto) about 6 years ago

Status changed from Open to Rejected

The display width of a string cannot be calculated without rendering information, which Ruby usually does not have.
Considering emojis or grapheme clusters, it is nearly impossible. It's the responsibility of the rendering engine.

Matz.

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #16006

String count and alignment that consider multibyte characters

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

Updated by shyouhei (Shyouhei Urabe) about 6 years ago

Updated by shyouhei (Shyouhei Urabe) about 6 years ago

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

Updated by shyouhei (Shyouhei Urabe) about 6 years ago

Updated by matz (Yukihiro Matsumoto) about 6 years ago