Project

General

Profile

Actions

Bug #9806

closed

URI#encode doesn't encode characters '[' and ']'. They should be encoded as %5B and %5D respectively.

Added by charlez (Charles Leu) over 10 years ago. Updated over 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
2.2.0 and prior versions as well
[ruby-core:62405]

Description

The subject says it all.

IRB session demonstrating the problem:
charlez$ irb
head :001 > RUBY_VERSION
=> "2.2.0"
head :002 > require 'uri'
=> true
head :003 > my_str = '[ futsal club ]'
=> "[ futsal club ]"
head :004 > URI.encode(my_str)
=> "[%20futsal%20club%20]"
head :005 >

Note: Testing using JavaScript function encodeURI('[ futsal club ]') produces "%5B%20futsal%20club%20%5D" which is the correct result.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #12235: URI.encode issue with square bracketsClosedActions

Updated by charlez (Charles Leu) over 10 years ago

Notes:

  • Per RFC 2396 section 2.4.3 "Data corresponding to excluded characters must be escaped in order to be properly represented within a URI."
  • Per RFC 2396 section 2.2 reserved characters are ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
  • Per URI::REGEXP::PATTERN reserved characters are ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," | "[" | "]"
  • Thus there appears to be an inconsistency between RFC 2396 2.2 and URI::REGEXP::PATTERN
  • After changing URI::REGEXP::PATTERN[:RESERVED] to omit characters '[' and ']', URI.encode( '[ futsal club ]') produces "%5B%20futsal%20club%20%5D", which I believe is correct.

Updated by mame (Yusuke Endoh) over 10 years ago

I'm unfamiliar with URI spec, but I guess RFC 2732 is related.

http://www.ietf.org/rfc/rfc2732.txt

This document incudes an update to the generic syntax for Uniform
Resource Identifiers defined in RFC 2396 [URL]. It defines a syntax
for IPv6 addresses and allows the use of "[" and "]" within a URI
explicitly for this reserved purpose.

--
Yusuke Endoh

Updated by johnnymugs (Jonathan Mukai) over 10 years ago

It looks like URI.encode/escape was deprecated in favor of either CGI.escape or URI.encode_www_form_component per https://github.com/ruby/ruby/commit/238b979f1789f95262a267d8df6239806f2859cc and some discussion here: https://www.ruby-forum.com/topic/207489

Both options give you the output you want.

However, I'm sure there's plenty of code hanging around that uses URI.escape. I wonder what the policy is for updating deprecated methods like this?

Johnny

Updated by charlez (Charles Leu) over 10 years ago

Yusuke Endoh wrote:

I'm unfamiliar with URI spec, but I guess RFC 2732 is related.

http://www.ietf.org/rfc/rfc2732.txt

This document incudes an update to the generic syntax for Uniform
Resource Identifiers defined in RFC 2396 [URL]. It defines a syntax
for IPv6 addresses and allows the use of "[" and "]" within a URI
explicitly for this reserved purpose.

--
Yusuke Endoh

FYI: Refer to the current W3.org BNF for URI syntax http://www.w3.org/Addressing/URL/5_URI_BNF.html

Note the statement 'The "national" and "punctuation" characters do not appear in any productions and therefore may not appear in URIs.'. That statement is at odds with RFC 2732.

It appears that authors of the standards docs aren't always aware of, and/or consistent with, other standards docs. Thus it is not surprising there is confusion regarding what is or isn't a valid URI encoding.

Actions #5

Updated by znz (Kazuhiro NISHIYAMA) over 7 years ago

  • Related to Bug #12235: URI.encode issue with square brackets added

Updated by jeremyevans0 (Jeremy Evans) over 2 years ago

  • Status changed from Open to Closed

URI.encode was removed in Ruby 3.0.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0