Bug #6799
closedDigest::*.hexdigest returns an ASCII-8BIT String
Description
$ ruby -rdigest/sha1 -e 'p Digest::SHA1.hexdigest("a").encoding'
#Encoding:ASCII-8BIT
I'm happy to provide a patch.
Files
Updated by duerst (Martin Dürst) over 12 years ago
So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]
Updated by Eregon (Benoit Daloze) over 12 years ago
duerst (Martin Dürst) wrote:
So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]
I thought it was clear enough, but indeed I should have mentioned it.
An hexadecimal String like this should have the US-ASCII encoding, because, by definition, it only has ASCII characters ([a-f0-9]).
This for example, makes YAML dump it as a binary String, and may cause other problems as the String is believed to be "binary", while it's not (#digest on the other hand has the ASCII-8BIT encoding, which is right).
Updated by Eregon (Benoit Daloze) over 12 years ago
- File 0001-ext-digest-digest.c-hexencode_str_new-return-an-ASCI.patch 0001-ext-digest-digest.c-hexencode_str_new-return-an-ASCI.patch added
Patch attached.
Updated by Anonymous over 12 years ago
On Fri, Jul 27, 2012 at 07:34:30PM +0900, Eregon (Benoit Daloze) wrote:
Issue #6799 has been updated by Eregon (Benoit Daloze).
duerst (Martin Dürst) wrote:
So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]
I thought it was clear enough, but indeed I should have mentioned it.
An hexadecimal String like this should have the US-ASCII encoding, because, by definition, it only has ASCII characters ([a-f0-9]).
FWIW, I would like this feature too. In Rails, we have to work around
this issue in some database drivers because the ASCII-8BIT string will
be considered to be binary.
--
Aaron Patterson
http://tenderlovemaking.com/
Updated by duerst (Martin Dürst) over 12 years ago
- Status changed from Feedback to Assigned
- Assignee set to Eregon (Benoit Daloze)
Given that the string has only ASCII characters, an encoding of US-ASCII indeed seems best. Benoit, unless somebody objects soon, I think you should go ahead with your patch.
Updated by Eregon (Benoit Daloze) over 12 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r36588.
Benoit, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
ext/digest/digest.c (hexencode_str_new): return an ASCII string
- test/digest: tests for all kind of digests encodings
[ruby-core:46792][Bug #6799]
Updated by Eregon (Benoit Daloze) over 12 years ago
duerst (Martin Dürst) wrote:
Given that the string has only ASCII characters, an encoding of US-ASCII indeed seems best. Benoit, unless somebody objects soon, I think you should go ahead with your patch.
Thank you, I just did.
Updated by tenderlovemaking (Aaron Patterson) over 12 years ago
- File secure_random.patch secure_random.patch added
Could we do this for SecureRandom.hex as well? I've attached a patch. /cc akr
Updated by Eregon (Benoit Daloze) over 12 years ago
@Aaron: Maybe String#unpack should be fixed instead to return an ASCII string for the 'H', 'h', 'B' and 'b' directives?
It makes sense to me since these will always contain 0-9a-f (0-1 for 'B','b') chars.
This is already done for the 'm', 'M' and 'u' directives in Array#pack.
What do you think?
P.S.: Does anyone know the rationale behind which method to use for 'u','m','M' and 'H','h','B','b' ?
To have a base64 representation you need [s].pack('m') (and so #pack is used for 'u','m','M' to obtain the specified format), but for 'H','h','B','b' you need s.unpack(directive). I think the latter makes more sense when compared to other types such as 'l', in which #unpack is used to obtain the desired format.
Updated by Anonymous over 12 years ago
On Fri, Sep 14, 2012 at 07:35:46PM +0900, Eregon (Benoit Daloze) wrote:
Issue #6799 has been updated by Eregon (Benoit Daloze).
@Aaron: Maybe String#unpack should be fixed instead to return an ASCII string for the 'H', 'h', 'B' and 'b' directives?
It makes sense to me since these will always contain 0-9a-f (0-1 for 'B','b') chars.
This is already done for the 'm', 'M' and 'u' directives in Array#pack.
What do you think?
I agree (and would prefer that). It just seems like a larger change
than making this method work for me. ;-)
--
Aaron Patterson
http://tenderlovemaking.com/