Bug #2251


URI.parse accepts strings with invalid characters

Added by emerose (Sam Quigley) almost 13 years ago. Updated over 11 years ago.

Target version:
ruby -v:
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.0.0]


The regexes used in URI::Parser's initialize_regexp use ^ and $ rather than \A and \Z:

399       # for URI::split
400       ret[:ABS_URI] ='^' + pattern[:X_ABS_URI] + '$', Regexp::EXTENDED)
401       ret[:REL_URI] ='^' + pattern[:X_REL_URI] + '$', Regexp::EXTENDED)

The result is that URI.parse matches on any URI separated by newlines, rather than on its argument as a whole:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> URI.parse("blah\n\nblahblah")
=> #<URI::HTTP:0x000001010aac78 URL:>

I think programmers would expect URI.parse to only successfully parse strings that are URIs, rather than any string that contains a URI surrounded by a particular kind of whitespace. This issue has apparently caused at least one security vulnerability in the real world:

Replacing the ^ and $ with \A and \Z should fix the issue, and is unlikely to break any existing code. The Rubyspec project does not seem to have any tests for this behavior. This behavior is present in at least versions 1.8.6, 1.8.7, and 1.9.1.


Actions #1

Updated by naruse (Yui NARUSE) almost 13 years ago

  • Status changed from Open to Assigned
  • Assignee set to akira (akira yamada)



Actions #2

Updated by naruse (Yui NARUSE) over 12 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r26229.
Sam, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.



Also available in: Atom PDF