Feature #16006
closedString count and alignment that consider multibyte characters
Description
In non-proportional font, multibyte characters have twice the width of ASCII characters. Since String#length
, String#ljust
, String#rjust
, and String#center
do not take this into consideration, applying these methods do not give the desired output.
array = ["aaあああ", "bいいいいいいいい", "cc"]
col_width = array.max(&:length)
array.each{|w| puts w.ljust(col_width, "*")}
# >> aaあああ****
# >> bいいいいいいいい
# >> cc*******
In order to do justification of strings that have multi-byte characters, we have to do something much more complicated such as the following:
col_widths =
array.to_h{|w| [
w,
w
.chars
.partition(&:ascii_only?)
.then{|ascii, non| ascii.length + (non.length * 2)}
]}
col_width = col_widths.values.max
array.each{|w| puts w + "*" * (col_width - col_widths[w])}
# Note that the following gives the desired alignment in non-proportional font, but may not appear so in this issue tracker.
# >> aaあああ*********
# >> bいいいいいいいい
# >> cc***************
This issue seems to be common, as several webpages can be found that attempt to do something similar.
I propose to give the relevant methods an option to take multibyte characters into consideration. Perhaps something like the proportional
keyword in the following may work:
"aaあああ".length(proportional: true) # => 8
"aaあああ".ljust(17, "*", proportional: true) # => "aaあああ*********"
Then, the desired output would be given by this code:
col_width = array.max{|w| w.length(proportional: true)}
array.each{|w| puts w.ljust(col_width, "*", proportional: true)}
# >> aaあああ*********
# >> bいいいいいいいい
# >> cc***************
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
This particular proposal is NG. ASCII vs. non-ASCII is too Asian-centric. There are other non-wide non-ASCII encodings, such as those in Europe.
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
- Is duplicate of Feature #14618: Add display width method to String for CLI added
Updated by sawa (Tsuyoshi Sawada) over 5 years ago
shyouhei (Shyouhei Urabe) wrote:
This particular proposal is NG. ASCII vs. non-ASCII is too Asian-centric. There are other non-wide non-ASCII encodings, such as those in Europe.
Yeah, the keyword name non_ascii
In my original proposal was not good. It would make things complicated, and was too specific, as @shyouhei (Shyouhei Urabe) has addressed.
I updated my proposal to have the keyword proportional
. I expect all the width to be handled automatically including non-wide non-ASCII letters.
Updated by shyouhei (Shyouhei Urabe) over 5 years ago
sawa (Tsuyoshi Sawada) wrote:
shyouhei (Shyouhei Urabe) wrote:
This particular proposal is NG. ASCII vs. non-ASCII is too Asian-centric. There are other non-wide non-ASCII encodings, such as those in Europe.
Yeah, the keyword name
non_ascii
In my original proposal was not good. It would make things complicated, and was too specific, as @shyouhei (Shyouhei Urabe) has addressed.I updated my proposal to have the keyword
proportional
. I expect all the width to be handled automatically including non-wide non-ASCII letters.
Still not appropriate. There are characters whose "wide"-ness is not fixed until they actually got rendered. See also: https://unicode.org/reports/tr11/ especially the section named "Modern Rendering Practice".
Updated by matz (Yukihiro Matsumoto) over 5 years ago
- Status changed from Open to Rejected
The display width of a string cannot be calculated without rendering information, which Ruby usually does not have.
Considering emojis or grapheme clusters, it is nearly impossible. It's the responsibility of the rendering engine.
Matz.