Project

General

Profile

Actions

Feature #20069

closed

Buffer class in stdlib

Added by pynix (Pynix wang) 5 months ago. Updated 5 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:115764]

Description

ruby use String to deal with bytes, this cause error on irb "invalid byte sequence in utf-8"

can we get a builtin class like Buffer or Bytes that represent as hex string

Updated by nobu (Nobuyoshi Nakada) 5 months ago

What's the use case?
Does it differ from IO::Buffer?

Updated by pynix (Pynix wang) 5 months ago

main use case is deal binary data,a replacement of String.

eg grpc bytes type, crypto key and more.

maybe not same as io buffer, so Bytes or Binary is good for class name.

Updated by pynix (Pynix wang) 5 months ago

irb(main):048> SecureRandom.bytes(10)
=> "\xB6e\x1C\xF3T\x9C\xA1\xDF\xBD\xEA"
irb(main):049> SecureRandom.bytes(10)
=> "\"\xC4;0\xB3\xA6!\x80jn"
irb(main):050> SecureRandom.bytes(10)
=> "\x9B\x9CP\t~\"\xB9\x8EAn"
irb(main):051> SecureRandom.bytes(10)
=> "\xAA\xDEf\x92\x8E\xEE5]\xD0\xB2"
irb(main):052> SecureRandom.bytes(10)
=> "\xFD\xE9\xF5@n\x1D\x9D\xB4\xB7\x8A"
irb(main):053> SecureRandom.bytes(10)
=> "\xCB\x90\xB0\xCB\xDF\xD2\xED\xAA\a\\"
irb(main):054> SecureRandom.bytes(10)
=> "\x16p\x15\x12\xC0\xD6\x02*D\xDB"
irb(main):055> SecureRandom.bytes(10)
=> "F\x97\xC2d\x84\a\x87\xA3P\b"
irb(main):056> SecureRandom.bytes(10)
=> "\x98\xAB\xA1\x96\x15\x91\x92\xF8e5"
irb(main):057> SecureRandom.bytes(10)
=> "\xE3\xD0\xB5P\x95ys\x0E\xCF'"
irb(main):058> SecureRandom.bytes(10)
=> "\xC5\xFB\x04\x97\xFC\xC0\xF5\xEF{\xA2"


use String as Bytes get a non unified representation, some bytes is translated into string, some not.

Updated by duerst (Martin Dürst) 5 months ago

pynix (Pynix wang) wrote:

ruby use String to deal with bytes

Ruby uses big classes. That avoids duplicating a lot of functionality in many classes, and also avoids a lot of conversion operations.

this cause error on irb "invalid byte sequence in utf-8"

Can you show an example?

pynix (Pynix wang) wrote in #note-3:

irb(main):058> SecureRandom.bytes(10)
=> "\xC5\xFB\x04\x97\xFC\xC0\xF5\xEF{\xA2"

use String as Bytes get a non unified representation, some bytes is translated into string, some not.

You can get a uniform representation with SecureRandom.hex(10).

Updated by austin (Austin Ziegler) 5 months ago

pynix (Pynix wang) wrote:

ruby use String to deal with bytes, this cause error on irb "invalid byte sequence in utf-8"

can we get a builtin class like Buffer or Bytes that represent as hex string

class Bytes < String
  def inspect
    bytes.pack("c*").unpack1("H*")
  end
end

s = Bytes.new(SecureRandom.bytes(10))

What might be more interesting than suggesting an unnecessary class, but suggesting a different #inspect if the encoding is ASCII-8BIT or BINARY (because SecureRandom.bytes(10).encoding # => #<Encoding:ASCII-8BIT>, which will eventually be called Encoding:BINARY).I’m not sure what such inspect should be, because the inspect that I wrote above for Bytes is both inefficient and incorrect (because the representation is not what is shown on #inspect, which differs from other strings).

Maybe:

class Bytes < String
  def inspect
    "#<#{encoding}:#{bytes.pack("c*").unpack1("H*")}>"
  end
end

Bytes.new(SecureRandom.bytes(10))
=> #<ASCII-8BIT:1c2dc1463d30c6ed0b9a>

Updated by shan (Shannon Skipper) 5 months ago

pynix (Pynix wang) wrote:

ruby use String to deal with bytes, this cause error on irb "invalid byte sequence in utf-8"

I'm curious, did you actually run into an "invalid byte sequence" error? If so, could you show the code that produced the error?

I know IO::Buffer has already been mentioned, but just wanted to point out it inspects with pretty hex.

>> IO::Buffer.for SecureRandom.random_bytes
=> 
#<IO::Buffer 0x00007f7dfa885998+16 EXTERNAL READONLY SLICE>
#0x00000000  17 bc 59 2d 8b 66 4b 6a 56 96 97 98 5e 07 45 d6 ..Y-.fKjV...^.E.

Updated by ioquatix (Samuel Williams) 5 months ago

ruby use String to deal with bytes, this cause error on irb "invalid byte sequence in utf-8"

This is desirable behaviour. The String with UTF-8 encoding cannot contain invalid byte sequences. If you want to store binary data, use Encoding::BINARY encoding.

can we get a builtin class like Buffer or Bytes that represent as hex string

A Binary string already presents the inspect output as a hex encoded String.

As others have pointed out, if you want actual memory mapped binary buffers, use IO::Buffer.

Updated by matz (Yukihiro Matsumoto) 5 months ago

  • Status changed from Open to Closed

Use either string with BINARY encoding or IO::Buffer. If these two lack what you want, open a new issue please.

Matz.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0