Project

General

Profile

Actions

Bug #21294

open

URI.extract is extracting invalid URIs with a mishmash of IPv6 notation with IPv4 address

Added by Keeyan (Keeyan Nejad) 7 days ago. Updated 6 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.3 (2025-04-14 revision d0b7e5b6a0) +PRISM [x86_64-linux]
[ruby-core:121772]

Description

The following is not a valid URI: http://[127.0.0.1]. So URI.extract should not extract it. It seems it is extracting it, though.

So if you have code which extracts all URIs and then parses them, like the following, an error will be raised:

require 'uri'

URI.extract("Fake URL: http://[127.0.0.1]" , :http).each do |uri| # => ['http://[127.0.0.1]']
  URI.parse(uri) # => raise URI::InvalidURIError
end

/home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:130:in 'URI::RFC3986_Parser#split': bad URI (is not URI?): "http://[127.0.0.1]" (URI::InvalidURIError)
	from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:135:in 'URI::RFC3986_Parser#parse'
	from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/common.rb:212:in 'URI.parse'
	from test.rb:4:in 'block in <main>'
	from test.rb:3:in 'Array#each'
	from test.rb:3:in '<main>'

Instead, I believe URI.extract, should return an empty array.

Updated by mame (Yusuke Endoh) 7 days ago

URI.extract is obsolete. You can confirm this by running the code in $VERBOSE mode:

$ ruby -w -ruri -e 'URI.extract("Fake URL: http://[127.0.0.1]" , :http)'
-e:1: warning: URI.extract is obsolete
/home/mame/.rbenv/versions/ruby-dev/lib/ruby/3.5.0+0/uri/common.rb:268: warning: URI::RFC3986_PARSER.extract is obsolete. Use URI::RFC2396_PARSER.extract explicitly.

If you still need this functionality, you should use URI::RFC2396_PARSER.extract along with URI::RFC2396_PARSER.parse. URI::RFC2396_PARSER.parse can successfully parse http://[127.0.0.1]. However, please note that this behavior is based on an older RFC.

require 'uri'

URI::RFC2396_PARSER.extract("Fake URL: http://[127.0.0.1]" , :http).each do |uri| # => ['http://[127.0.0.1]']
  p URI::RFC2396_PARSER.parse(uri) # => #<URI::HTTP http://[127.0.0.1]>
end

Updated by Keeyan (Keeyan Nejad) 6 days ago

Ah thank you @mame (Yusuke Endoh)! I wasn't aware it was obsolete. We can use URI::RFC2396_PARSER for our cases. Do you happen to know why extract is not being included in the newer parses? I had a look at the relevant PRs for in the URI repo, but couldn't find anything explaining the reasoning.

Actions

Also available in: Atom PDF

Like0
Like0Like0