Feature #20594: A new String method to append bytes while preserving encoding - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #20594

closed

A new String method to append bytes while preserving encoding

Feature #20594: A new String method to append bytes while preserving encoding

Added by byroot (Jean Boussier) over 1 year ago. Updated over 1 year ago.

Status:

Closed

Assignee:

byroot (Jean Boussier)

Target version:

[ruby-core:118388]

Description

Context¶

When working with binary protocols such as protobuf or MessagePack, you may often need to assemble multiple
strings of different encoding:

Post = Struct.new(:title, :body) do
  def serialize(buf)
    buf <<
      255 << title.bytesize << title <<
      255 << body.bytesize << body
  end
end

Post.new("Hello", "World").serialize("somedata".b) # => "somedata\xFF\x05Hello\xFF\x05World" #<Encoding:ASCII-8BIT>

The problem in the above case, is that because Encoding::ASCII_8BIT is declared as ASCII compatible,
if one of the appended string contains bytes outside the ASCII range, string is automatically promoted
to another encoding, which then leads to encoding issues:

Post.new("H€llo", "Wôrld").serialize("somedata".b) # => incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)

In many cases, you want to append to a String without changing the receiver's encoding.

The issue isn't exclusive to binary protocols and formats, it also happen with ASCII protocols that accept arbitrary bytes inline,
like Redis's RESP protocol or even HTTP/1.1.

Previous discussion¶

There was a similar feature request a while ago, but it was abandoned: https://bugs.ruby-lang.org/issues/14975

Existing solutions¶

You can of course always cast the strings you append to avoid this problem:

Post = Struct.new(:title, :body) do
  def serialize(buf)
    buf <<
      255 << title.bytesize << title.b <<
      255 << body.bytesize << body.b
  end
end

But this cause a lot of needless allocations.

You'd think you could also use bytesplice, but it actually has the same issue:

Post = Struct.new(:title, :body) do
  def serialize(buf)
    buf << 255 << title.bytesize
    buf.bytesplice(buf.bytesize, title.bytesize, title)
    buf << 255 << body.bytesize
    buf.bytesplice(buf.bytesize, body.bytesize, title)
  end
end
Post.new("H€llo", "Wôrld").serialize("somedata".b) # => 'String#bytesplice': incompatible character encodings: BINARY (ASCII-8BIT) and UTF-8 (Encoding::CompatibilityError)

And even if it worked, it would be very unergonomic.

Proposal: a `byteconcat` method¶

A solution to this would be to add a new byteconcat method, that could be shimed as:

class String
  def byteconcat(*strings)
    strings.map! do |s|
      if s.is_a?(String) && s.encoding != encoding
        s.dup.force_encoding(encoding)
      else
        s
      end
    end
    concat(*strings)
  end
end

Post = Struct.new(:title, :body) do
  def serialize(buf)
    buf.byteconcat(
      255, title.bytesize, title,
      255, body.bytesize, body,
    )
  end
end

Post.new("H€llo", "Wôrld").serialize("somedata".b) # => "somedata\xFF\aH\xE2\x82\xACllo\xFF\x06W\xC3\xB4rld" #<Encoding:ASCII-8BIT>

But of course a builtin implementation wouldn't need to dup the arguments.

Like other byte* methods, it's the responsibility of the caller to ensure the resulting string has a valid encoding, or
to deal with it if not.

Method name and signature¶

Name¶

This proposal suggests String#byteconcat, to mirror String#concat, but other names are possible:

byteappend (like Array#append)
bytepush (like Array#push)

Signature¶

This proposal makes byteconcat accept either String or Integer (in char range) arguments like concat. I believe it makes sense for consistency and also because it's not uncommon for protocols to have some byte based segments, and Integers are more convenient there.

The proposed method also accept variable arguments for consistency with String#concat, Array#push, Array#append.

The proposed method returns self, like concat and others.

YJIT consideration¶

I consulted @maximecb (Maxime Chevalier-Boisvert) about this proposal, and according to her, accepting variable arguments makes it harder for YJIT to optimize.
I suspect consistency with other APIs trumps the performance consideration, but I think it's worth mentioning.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: PDF Atom

	Related to Ruby - Feature #14975: String#append without changing receiver's encoding	Rejected	ioquatix (Samuel Williams)	Actions
	Related to Ruby - Bug #15460: Behaviour of String#setbyte changed	Closed	shyouhei (Shyouhei Urabe)	Actions

Project

General

Profile

Ruby

Custom queries

Feature #20594

A new String method to append bytes while preserving encoding

Context¶

Previous discussion¶

Existing solutions¶

Proposal: a byteconcat method¶

Method name and signature¶

Name¶

Signature¶

YJIT consideration¶

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #1

Updated by Eregon (Benoit Daloze) over 1 year ago ActionsCopy link #2 [ruby-core:118389]

Updated by maximecb (Maxime Chevalier-Boisvert) over 1 year ago ActionsCopy link #3 [ruby-core:118390]

Updated by matz (Yukihiro Matsumoto) over 1 year ago ActionsCopy link #4 [ruby-core:118543]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #5 [ruby-core:118545]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #6 [ruby-core:118547]

Updated by mame (Yusuke Endoh) over 1 year ago ActionsCopy link #7 [ruby-core:118554]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #8 [ruby-core:118555]

Updated by Eregon (Benoit Daloze) over 1 year ago ActionsCopy link #9 [ruby-core:118560]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #10 [ruby-core:118562]

Updated by Eregon (Benoit Daloze) over 1 year ago ActionsCopy link #11 [ruby-core:118564]

Updated by shugo (Shugo Maeda) over 1 year ago ActionsCopy link #12 [ruby-core:118576]

Updated by Dan0042 (Daniel DeLorme) over 1 year ago ActionsCopy link #13 [ruby-core:118632]

Updated by alanwu (Alan Wu) over 1 year ago ActionsCopy link #14 [ruby-core:118633]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #15 [ruby-core:118636]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #16 [ruby-core:118690]

Updated by duerst (Martin Dürst) over 1 year ago ActionsCopy link #17 [ruby-core:118735]

Updated by Dan0042 (Daniel DeLorme) over 1 year ago ActionsCopy link #18 [ruby-core:118738]

Updated by matz (Yukihiro Matsumoto) over 1 year ago ActionsCopy link #19 [ruby-core:118763]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #20 [ruby-core:118766]

Updated by alanwu (Alan Wu) over 1 year ago ActionsCopy link #21 [ruby-core:118780]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #22 [ruby-core:118799]

Updated by tenderlovemaking (Aaron Patterson) over 1 year ago ActionsCopy link #23 [ruby-core:118804]

Updated by mame (Yusuke Endoh) over 1 year ago ActionsCopy link #24 [ruby-core:118825]

Updated by matz (Yukihiro Matsumoto) over 1 year ago ActionsCopy link #25 [ruby-core:119053]

Updated by byroot (Jean Boussier) over 1 year ago ActionsCopy link #26

Updated by Dan0042 (Daniel DeLorme) over 1 year ago ActionsCopy link #27 [ruby-core:119115]

Updated by tenderlovemaking (Aaron Patterson) over 1 year ago ActionsCopy link #28 [ruby-core:119116]

Updated by Dan0042 (Daniel DeLorme) over 1 year ago ActionsCopy link #29 [ruby-core:119118]

Updated by shyouhei (Shyouhei Urabe) over 1 year ago ActionsCopy link #30 [ruby-core:119119]

Updated by Dan0042 (Daniel DeLorme) over 1 year ago ActionsCopy link #31 [ruby-core:119120]

Updated by gettalong (Thomas Leitner) over 1 year ago ActionsCopy link #32 [ruby-core:119122]

Updated by Eregon (Benoit Daloze) over 1 year ago ActionsCopy link #33 [ruby-core:119123]

Updated by Eregon (Benoit Daloze) over 1 year ago ActionsCopy link #34

Proposal: a `byteconcat` method¶

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#1

Updated by Eregon (Benoit Daloze) over 1 year ago Actions
Copy link
#2 [ruby-core:118389]

Updated by maximecb (Maxime Chevalier-Boisvert) over 1 year ago Actions
Copy link
#3 [ruby-core:118390]

Updated by matz (Yukihiro Matsumoto) over 1 year ago Actions
Copy link
#4 [ruby-core:118543]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#5 [ruby-core:118545]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#6 [ruby-core:118547]

Updated by mame (Yusuke Endoh) over 1 year ago Actions
Copy link
#7 [ruby-core:118554]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#8 [ruby-core:118555]

Updated by Eregon (Benoit Daloze) over 1 year ago Actions
Copy link
#9 [ruby-core:118560]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#10 [ruby-core:118562]

Updated by Eregon (Benoit Daloze) over 1 year ago Actions
Copy link
#11 [ruby-core:118564]

Updated by shugo (Shugo Maeda) over 1 year ago Actions
Copy link
#12 [ruby-core:118576]

Updated by Dan0042 (Daniel DeLorme) over 1 year ago Actions
Copy link
#13 [ruby-core:118632]

Updated by alanwu (Alan Wu) over 1 year ago Actions
Copy link
#14 [ruby-core:118633]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#15 [ruby-core:118636]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#16 [ruby-core:118690]

Updated by duerst (Martin Dürst) over 1 year ago Actions
Copy link
#17 [ruby-core:118735]

Updated by Dan0042 (Daniel DeLorme) over 1 year ago Actions
Copy link
#18 [ruby-core:118738]

Updated by matz (Yukihiro Matsumoto) over 1 year ago Actions
Copy link
#19 [ruby-core:118763]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#20 [ruby-core:118766]

Updated by alanwu (Alan Wu) over 1 year ago Actions
Copy link
#21 [ruby-core:118780]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#22 [ruby-core:118799]

Updated by tenderlovemaking (Aaron Patterson) over 1 year ago Actions
Copy link
#23 [ruby-core:118804]

Updated by mame (Yusuke Endoh) over 1 year ago Actions
Copy link
#24 [ruby-core:118825]

Updated by matz (Yukihiro Matsumoto) over 1 year ago Actions
Copy link
#25 [ruby-core:119053]

Updated by byroot (Jean Boussier) over 1 year ago Actions
Copy link
#26

Updated by Dan0042 (Daniel DeLorme) over 1 year ago Actions
Copy link
#27 [ruby-core:119115]

Updated by tenderlovemaking (Aaron Patterson) over 1 year ago Actions
Copy link
#28 [ruby-core:119116]

Updated by Dan0042 (Daniel DeLorme) over 1 year ago Actions
Copy link
#29 [ruby-core:119118]

Updated by shyouhei (Shyouhei Urabe) over 1 year ago Actions
Copy link
#30 [ruby-core:119119]

Updated by Dan0042 (Daniel DeLorme) over 1 year ago Actions
Copy link
#31 [ruby-core:119120]

Updated by gettalong (Thomas Leitner) over 1 year ago Actions
Copy link
#32 [ruby-core:119122]

Updated by Eregon (Benoit Daloze) over 1 year ago Actions
Copy link
#33 [ruby-core:119123]

Updated by Eregon (Benoit Daloze) over 1 year ago Actions
Copy link
#34