Project

General

Profile

ActionsLike0

Bug #6799

closed

Digest::*.hexdigest returns an ASCII-8BIT String

Added by Eregon (Benoit Daloze) over 12 years ago. Updated over 12 years ago.

Status:
Closed
Target version:
ruby -v:
ruby 2.0.0dev (2012-07-15 trunk 36395) [x86_64-darwin10.8.0]
Backport:
[ruby-core:46792]

Description

$ ruby -rdigest/sha1 -e 'p Digest::SHA1.hexdigest("a").encoding'
#Encoding:ASCII-8BIT

I'm happy to provide a patch.


Files

0001-ext-digest-digest.c-hexencode_str_new-return-an-ASCI.patch (2.59 KB) 0001-ext-digest-digest.c-hexencode_str_new-return-an-ASCI.patch Eregon (Benoit Daloze), 07/27/2012 09:01 PM
noname (500 Bytes) noname Anonymous, 07/29/2012 10:53 AM
secure_random.patch (863 Bytes) secure_random.patch tenderlovemaking (Aaron Patterson), 09/06/2012 08:58 AM
noname (500 Bytes) noname Anonymous, 09/15/2012 03:53 PM

Updated by duerst (Martin Dürst) over 12 years ago

So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]

Updated by Eregon (Benoit Daloze) over 12 years ago

duerst (Martin Dürst) wrote:

So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]

I thought it was clear enough, but indeed I should have mentioned it.
An hexadecimal String like this should have the US-ASCII encoding, because, by definition, it only has ASCII characters ([a-f0-9]).

This for example, makes YAML dump it as a binary String, and may cause other problems as the String is believed to be "binary", while it's not (#digest on the other hand has the ASCII-8BIT encoding, which is right).

Updated by Anonymous over 12 years ago

On Fri, Jul 27, 2012 at 07:34:30PM +0900, Eregon (Benoit Daloze) wrote:

Issue #6799 has been updated by Eregon (Benoit Daloze).

duerst (Martin Dürst) wrote:

So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]

I thought it was clear enough, but indeed I should have mentioned it.
An hexadecimal String like this should have the US-ASCII encoding, because, by definition, it only has ASCII characters ([a-f0-9]).

FWIW, I would like this feature too. In Rails, we have to work around
this issue in some database drivers because the ASCII-8BIT string will
be considered to be binary.

--
Aaron Patterson
http://tenderlovemaking.com/

Updated by duerst (Martin Dürst) over 12 years ago

  • Status changed from Feedback to Assigned
  • Assignee set to Eregon (Benoit Daloze)

Given that the string has only ASCII characters, an encoding of US-ASCII indeed seems best. Benoit, unless somebody objects soon, I think you should go ahead with your patch.

Like0Actions #7

Updated by Eregon (Benoit Daloze) over 12 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r36588.
Benoit, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


ext/digest/digest.c (hexencode_str_new): return an ASCII string

Updated by Eregon (Benoit Daloze) over 12 years ago

duerst (Martin Dürst) wrote:

Given that the string has only ASCII characters, an encoding of US-ASCII indeed seems best. Benoit, unless somebody objects soon, I think you should go ahead with your patch.

Thank you, I just did.

Updated by tenderlovemaking (Aaron Patterson) over 12 years ago

Could we do this for SecureRandom.hex as well? I've attached a patch. /cc akr

Updated by Eregon (Benoit Daloze) over 12 years ago

@Aaron: Maybe String#unpack should be fixed instead to return an ASCII string for the 'H', 'h', 'B' and 'b' directives?
It makes sense to me since these will always contain 0-9a-f (0-1 for 'B','b') chars.
This is already done for the 'm', 'M' and 'u' directives in Array#pack.
What do you think?

P.S.: Does anyone know the rationale behind which method to use for 'u','m','M' and 'H','h','B','b' ?
To have a base64 representation you need [s].pack('m') (and so #pack is used for 'u','m','M' to obtain the specified format), but for 'H','h','B','b' you need s.unpack(directive). I think the latter makes more sense when compared to other types such as 'l', in which #unpack is used to obtain the desired format.

Updated by Anonymous over 12 years ago

On Fri, Sep 14, 2012 at 07:35:46PM +0900, Eregon (Benoit Daloze) wrote:

Issue #6799 has been updated by Eregon (Benoit Daloze).

@Aaron: Maybe String#unpack should be fixed instead to return an ASCII string for the 'H', 'h', 'B' and 'b' directives?
It makes sense to me since these will always contain 0-9a-f (0-1 for 'B','b') chars.
This is already done for the 'm', 'M' and 'u' directives in Array#pack.
What do you think?

I agree (and would prefer that). It just seems like a larger change
than making this method work for me. ;-)

--
Aaron Patterson
http://tenderlovemaking.com/

ActionsLike0

Also available in: Atom PDF