String#unpack with 'M' directive can create strings with wrong code range
I've noticed that
String#unpack with the
'M' directive can create strings that should be
CR_VALID. The issue appears to have been introduced in r30542, which assumes that all
ASCII-8BIT strings must be
CR_VALID. It's possible this was correct back during Ruby 1.9.3 development and just wasn't updated. I'm not familiar enough with the history to tell.
A simple reproduction showing the issue is:
res = '0123456789=\n'.unpack('M').first p res p res.encoding p res.bytes p res.ascii_only? puts packed = res.bytes.pack('c*') p packed p packed.encoding p packed.bytes p packed.ascii_only?
This yields the following output:
"0123456789=\\n" #<Encoding:ASCII-8BIT> [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 61, 92, 110] false "0123456789=\\n" #<Encoding:ASCII-8BIT> [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 61, 92, 110] true
Both strings have exactly the same contents with the same encoding. But, depending on how you construct them, one is consider to be
CR_7BIT value (indicated by the
String#ascii_only? output), and one is considered to be
CR_VALID. I believe
CR_7BIT is the correct code range value in this situation.