Bug #21294
openURI.extract is extracting invalid URIs with a mishmash of IPv6 notation with IPv4 address
Description
The following is not a valid URI: http://[127.0.0.1]
. So URI.extract
should not extract it. It seems it is extracting it, though.
So if you have code which extracts all URIs and then parses them, like the following, an error will be raised:
require 'uri'
URI.extract("Fake URL: http://[127.0.0.1]" , :http).each do |uri| # => ['http://[127.0.0.1]']
URI.parse(uri) # => raise URI::InvalidURIError
end
/home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:130:in 'URI::RFC3986_Parser#split': bad URI (is not URI?): "http://[127.0.0.1]" (URI::InvalidURIError)
from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:135:in 'URI::RFC3986_Parser#parse'
from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/common.rb:212:in 'URI.parse'
from test.rb:4:in 'block in <main>'
from test.rb:3:in 'Array#each'
from test.rb:3:in '<main>'
Instead, I believe URI.extract
, should return an empty array.
Updated by mame (Yusuke Endoh) 7 days ago
URI.extract
is obsolete. You can confirm this by running the code in $VERBOSE
mode:
$ ruby -w -ruri -e 'URI.extract("Fake URL: http://[127.0.0.1]" , :http)'
-e:1: warning: URI.extract is obsolete
/home/mame/.rbenv/versions/ruby-dev/lib/ruby/3.5.0+0/uri/common.rb:268: warning: URI::RFC3986_PARSER.extract is obsolete. Use URI::RFC2396_PARSER.extract explicitly.
If you still need this functionality, you should use URI::RFC2396_PARSER.extract
along with URI::RFC2396_PARSER.parse
. URI::RFC2396_PARSER.parse
can successfully parse http://[127.0.0.1]
. However, please note that this behavior is based on an older RFC.
require 'uri'
URI::RFC2396_PARSER.extract("Fake URL: http://[127.0.0.1]" , :http).each do |uri| # => ['http://[127.0.0.1]']
p URI::RFC2396_PARSER.parse(uri) # => #<URI::HTTP http://[127.0.0.1]>
end
Updated by Keeyan (Keeyan Nejad) 6 days ago
Ah thank you @mame (Yusuke Endoh)! I wasn't aware it was obsolete. We can use URI::RFC2396_PARSER
for our cases. Do you happen to know why extract
is not being included in the newer parses? I had a look at the relevant PRs for in the URI repo, but couldn't find anything explaining the reasoning.