Bug #5684
closed[[Ruby 1.9:]] Socket doesn't respect default external encoding
Description
When receiving data from a TCPSocket (as in client.rb, attached), the default internal encoding specified by the -E option to ruby is not respected.
Steps:
(1) In terminal window A, run: ruby server.rb
(2) In terminal window B, run: ruby -E ISO-8859-1 client.rb
Expected result for terminal window B:
bytes: "hell\xF6"
encoding: ISO-8859-1
Actual result for terminal window B:
bytes: "hell\xF6"
encoding: ASCII-8BIT
Workaround:
Use String#force_encoding('ISO-8859-1')
Files
Updated by naruse (Yui NARUSE) over 12 years ago
You can set encodings to a Socket object with Socket#set_encoding.
But Socket#recv is an binary API like IO#read(n)
You can use textual API IO#read and get ISO-8859-1 string.
Updated by vovik (Vladimir Chernis) over 12 years ago
Yui NARUSE wrote:
You can set encodings to a Socket object with Socket#set_encoding.
I understand, but if I don't call Socket#set_encoding, shouldn't the encoding fall back to the default encoding specified by the -E option to ruby?
But Socket#recv is an binary API like IO#read(n)
You can use textual API IO#read and get ISO-8859-1 string.
Is IO#read the same as Socket#read? Because changing recv
to read
in client.rb doesn't change anything about the encoding.
I know File#read respects the default encoding. It would be nice if Socket#read did the same thing, especially since Net::HTTP uses Socket.
Updated by vovik (Vladimir Chernis) over 12 years ago
- File socket_vs_file.rb socket_vs_file.rb added
To summarize:
File IO encoding works correctly in that it respects the default external encoding specified in the -E option to ruby. But Socket encoding does not.
I've attached a simple test case to illustrate the problem. When I run it with ruby -E ISO-8859-1 socket_vs_file.rb
, I expect the following output:
file encoding: ISO-8859-1
socket encoding: ISO-8859-1
But instead, I get this output:
file encoding: ISO-8859-1
socket encoding: ASCII-8BIT
Am I mistaken to expect this behavior?
Updated by naruse (Yui NARUSE) over 12 years ago
Vladimir Chernis wrote:
Yui NARUSE wrote:
You can set encodings to a Socket object with Socket#set_encoding.
I understand, but if I don't call Socket#set_encoding, shouldn't the encoding fall back to the default encoding specified by the -E option to ruby?
Socket doesn't respect default_external because default_external is set from the locale of the client system,
but the encoding of the input string from sockets is depend on the server software.
Moreover data from socket is usually binary.
But Socket#recv is an binary API like IO#read(n)
You can use textual API IO#read and get ISO-8859-1 string.Is IO#read the same as Socket#read? Because changing
recv
toread
in client.rb doesn't change anything about the encoding.I know File#read respects the default encoding. It would be nice if Socket#read did the same thing, especially since Net::HTTP uses Socket.
File and Socket are different.
Note that Net::HTTP's policy is independent from Socket.
Am I mistaken to expect this behavior?
The conclusion is, Yes.
Updated by ko1 (Koichi Sasada) about 12 years ago
- Assignee set to naruse (Yui NARUSE)
Updated by naruse (Yui NARUSE) about 12 years ago
- Status changed from Open to Rejected