Bug #19867: Unicode line and paragraph separator are not stripped - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #19867

closed

Unicode line and paragraph separator are not stripped

Bug #19867: Unicode line and paragraph separator are not stripped

Added by iainbeeston (Iain Beeston) over 2 years ago. Updated over 2 years ago.

Status:

Rejected

Assignee:

Target version:

ruby -v:

ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]

Backport:

3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN

[ruby-core:114662]

Description

Unicode newline and paragraph separators are not removed by any of the strip methods:

"\u2028\u2029\u0000\t\n\v\f\r ".strip # => "\u2028\u2029"

I would have expected strip (and lstrip, rstrip) to remove unicode whitespace as well. It looks like #7154 reported something similar but for regular expressions and way back In ruby 1.9.

I think that fixing this should be simple (just checking for \x2028 and \x2029 in ctype.h) but I'm not sure if it's supposed to behave this way or if changing it could introduce unexpected consequences.

Updated by iainbeeston (Iain Beeston) over 2 years ago Actions
Copy link
#1 [ruby-core:114663]

I can see that the [[:space:]] regex class does match unicode whitespace characters ("\u2028" =~ /[[:space:]]/ # => 0) but \s does not ("\u2028" =~ /\s/ # => nil)

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#2 [ruby-core:114664]

Yes, \s, \w etc match only single-byte ASCII characters.
I don't think changing the behavior by default is good idea.
An optional (keyword) argument may be better.

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#3 [ruby-core:114665]

As for the implementation, changing ctype.h is not desirable.
There is rb_enc_isspace function for such purpose already.

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#4

Status changed from Open to Rejected

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #19867

Unicode line and paragraph separator are not stripped

Updated by iainbeeston (Iain Beeston) over 2 years ago Actions
Copy link
#1 [ruby-core:114663]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#2 [ruby-core:114664]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#3 [ruby-core:114665]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#4

Project

General

Profile

Ruby

Custom queries

Bug #19867

Unicode line and paragraph separator are not stripped

Updated by iainbeeston (Iain Beeston) over 2 years ago ActionsCopy link #1 [ruby-core:114663]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago ActionsCopy link #2 [ruby-core:114664]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago ActionsCopy link #3 [ruby-core:114665]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago ActionsCopy link #4

Updated by iainbeeston (Iain Beeston) over 2 years ago Actions
Copy link
#1 [ruby-core:114663]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#2 [ruby-core:114664]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#3 [ruby-core:114665]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#4