Backport #5233
closedOpenSSL::SSL::SSLSocket has problems with encodings other than "ascii"
Description
The attached script shows the issue. It expects a combined cert/private key in the file server.pem under the current directory (also attached).
Under Ruby 1.9.2p290, the script prints "str.size is 50848 (expecting 100000)".
As a workaround the string encoding can be forced to "ascii" before the write.
Files
Updated by MartinBosslet (Martin Bosslet) about 13 years ago
- Category set to ext
- Status changed from Open to Assigned
- Assignee set to MartinBosslet (Martin Bosslet)
- Target version set to 1.9.4
Updated by MartinBosslet (Martin Bosslet) about 13 years ago
The problem is in lib/openssl/buffering.rb:
def do_write(s)
@wbuffer = "" unless defined? @wbuffer
@wbuffer << s
@sync ||= false
if @sync or @wbuffer.size > BLOCK_SIZE or idx = @wbuffer.rindex($/)
remain = idx ? idx + $/.size : @wbuffer.length
nwritten = 0
while remain > 0
str = @wbuffer[nwritten,remain]
begin
nwrote = syswrite(str)
rescue Errno::EAGAIN
retry
end
remain -= nwrote
nwritten += nwrote
end
@wbuffer[0,nwritten] = ""
end
end
remain gets initialized with @wbuffer.length, the string length in characters,
but nwrote receives the actual number of bytes written, so less bytes than
actually available are written.
A fix for this would be treating @wbuffer strictly as binary data by forcing
its encoding to BINARY. I'm not sure, does anyone see a more elegant way or
would this solution suffice?
Updated by normalperson (Eric Wong) about 13 years ago
Martin Bosslet Martin.Bosslet@googlemail.com wrote:
The problem is in lib/openssl/buffering.rb:
def do_write(s) @wbuffer = "" unless defined? @wbuffer @wbuffer << s @sync ||= false if @sync or @wbuffer.size > BLOCK_SIZE or idx = @wbuffer.rindex($/) remain = idx ? idx + $/.size : @wbuffer.length nwritten = 0 while remain > 0 str = @wbuffer[nwritten,remain] begin nwrote = syswrite(str) rescue Errno::EAGAIN retry end remain -= nwrote nwritten += nwrote end @wbuffer[0,nwritten] = "" end end
remain gets initialized with @wbuffer.length, the string length in characters,
but nwrote receives the actual number of bytes written, so less bytes than
actually available are written.A fix for this would be treating @wbuffer strictly as binary data by forcing
its encoding to BINARY. I'm not sure, does anyone see a more elegant way or
would this solution suffice?
I use an "-- encoding: binary --" comment at the top of all Ruby
source files where I initialize string literals for storing binary data.
It's cleaner than setting Encoding::BINARY on every string I create
(and nearly all my code works exclusively on binary data).
Also, all of the Ruby (non-SSL) *Socket objects have Encoding::BINARY by
default anyways, so I think SSLSocket should be the same.
Updated by MartinBosslet (Martin Bosslet) about 13 years ago
I use an "-- encoding: binary --" comment at the top of all Ruby
source files where I initialize string literals for storing binary data.
It's cleaner than setting Encoding::BINARY on every string I create
(and nearly all my code works exclusively on binary data).
I'm afraid this had no effect, or I did it wrong, or I might also have
misunderstood you. The incoming string s already has UTF-8 encoding, so
@wbuffer << s
ends up as UTF-8 regardless of the encoding I set for the .rb file, I
figured this was because "<<" calls rb_str_append which again calls
rb_enc_check which will determine a compatible encoding, in this case
UTF-8, for @wbuffer. But again, I might have misunderstood you.
Also, all of the Ruby (non-SSL) *Socket objects have Encoding::BINARY by
default anyways, so I think SSLSocket should be the same.
I'm sorry, I don't understand what you mean by the *Socket objects have
binary encoding by default - do you mean it's binary data they are expecting
to deal with for input and output? So a user would have to make sure to only
pass already BINARY-encoded strings to any *Socket?
I quickly checked with a TCPServer and Net::HTTP client, there the aforementioned
situation would work, when sending 100000 a-Umlauts you again receive the same
amount, after enforcing the response to UTF-8 again, of course. That's why I
thought that an SSLSocket should behave the same way.
Updated by normalperson (Eric Wong) about 13 years ago
Martin Bosslet Martin.Bosslet@googlemail.com wrote:
I use an "-- encoding: binary --" comment at the top of all Ruby
source files where I initialize string literals for storing binary data.
It's cleaner than setting Encoding::BINARY on every string I create
(and nearly all my code works exclusively on binary data).I'm afraid this had no effect, or I did it wrong, or I might also have
misunderstood you. The incoming string s already has UTF-8 encoding, so@wbuffer << s
ends up as UTF-8 regardless of the encoding I set for the .rb file, I
figured this was because "<<" calls rb_str_append which again calls
rb_enc_check which will determine a compatible encoding, in this case
UTF-8, for @wbuffer. But again, I might have misunderstood you.
You're right. rb_str_append() modifies the empty @wbuffer to the
encoding of "s" above :(
I suppose calling @wbuffer.force_encoding(Encoding::BINARY) after
@wbuffer is necessary (unless you write the buffering code in C like
io.c does).
Also, all of the Ruby (non-SSL) *Socket objects have Encoding::BINARY by
default anyways, so I think SSLSocket should be the same.I'm sorry, I don't understand what you mean by the *Socket objects have
binary encoding by default - do you mean it's binary data they are expecting
to deal with for input and output? So a user would have to make sure to only
pass already BINARY-encoded strings to any *Socket?
For all newly-created *Socket objects, external_encoding is already
ASCII-8BIT (binary) and the sockets should just pass the byte buffer of
any underlying String objects given to it.
I quickly checked with a TCPServer and Net::HTTP client, there the aforementioned
situation would work, when sending 100000 a-Umlauts you again receive the same
amount, after enforcing the response to UTF-8 again, of course. That's why I
thought that an SSLSocket should behave the same way.
Yes, underlying IO#read/read_nonblock/sysread for the TCPSocket objects
should return new ASCII-8BIT Strings. You needed to force them to UTF-8
yourself upon receipt.
Updated by Anonymous about 13 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r33485.
Niklas, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- lib/openssl/buffering.rb: Force multi-byte strings to be treated as
binary data. - test/openssl/test_ssl.rb: Add test for it.
Thanks to Niklas Baumstark for reporting the issue!
[Ruby 1.9 - Bug #5233] [ruby-core:39120]
Updated by MartinBosslet (Martin Bosslet) about 13 years ago
Also thanks to Eric, for providing his thoughts on the topic!
Updated by naruse (Yui NARUSE) over 12 years ago
- Tracker changed from Bug to Backport
- Project changed from Ruby master to Backport193
- Category deleted (
ext) - Target version deleted (
1.9.4)
Updated by naruse (Yui NARUSE) over 12 years ago
- Status changed from Closed to Assigned
This breaks tests on CentOS 5.6 (but not Ubuntu 10.04, FreeBSD 8, 9).
http://c5664.rubyci.org/~chkbuild/ruby-1.9.3/log/20120210T173209Z.diff.html.gz
Maybe more some commits are needed.
Updated by naruse (Yui NARUSE) over 12 years ago
- Status changed from Assigned to Closed
Backported r33508 and fixed this.