Project

General

Profile

Actions

Bug #12852

closed

URI.parse can't handle non-ascii URIs

Added by olivierlacan (Olivier Lacan) over 7 years ago. Updated over 1 year ago.

Status:
Closed
Target version:
-
[ruby-core:77666]
Tags:

Description

Given a return URL path like: /search?utf8=\u{2713}&q=foo, URI.parse raises the following exception:

URI.parse "/search?utf8=\u{2713}&q=foo"
URI::InvalidURIError: URI must be ascii only "/search?utf8=\u{2713}&q=foo"

This \u{2713} character is commonly used by web frameworks like Rails to enforce UTF-8 in forms: https://github.com/rails/rails/blob/92703a9ea5d8b96f30e0b706b801c9185ef14f0e/actionview/lib/action_view/helpers/form_tag_helper.rb#L823-L830

"\u{2713}"
=> "✓"

Is it unreasonable to expect non-ascii portion of URIs to be handled by URI.parse? The way to circumvent this issue is to call URI.encode on the URI string prior to passing it to URI.parse:

URI.parse URI.encode("/search?utf8=\u{2713}&q=foo")
=> #<URI::Generic /search?utf8=%E2%9C%93&q=foo>

By comparison, a library like Addressable parses this URI without issue.

require "addressable/uri"
=> #<Addressable::URI:0x3feffa84158c URI:/search?utf8=✓&q=foo>

This is how Addressable implements parsing:
https://github.com/sporkmonger/addressable/blob/a15b7045a09911bcc47b106200554809c879a5f6/lib/addressable/uri.rb#L75-L145

PS: Tried under MRI 2.3.1 and 2.4.0-preview1

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0