Bug #8352: URI squeezes a sequence of slashes in merging paths when it shouldn't - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #8352

closed

URI squeezes a sequence of slashes in merging paths when it shouldn't

Bug #8352: URI squeezes a sequence of slashes in merging paths when it shouldn't

Added by knu (Akinori MUSHA) about 13 years ago. Updated over 8 years ago.

Status:

Closed

Assignee:

naruse (Yui NARUSE)

Target version:

2.5

ruby -v:

ruby 2.1.0dev (2013-05-01 trunk 40540) [x86_64-freebsd9]

Backport:

[ruby-core:54729]

Description

RFC 2396 (on which the library currently is based) or RFC 3986 says nothing about a sequence of slashes in the path part except for parsing rules when a URI (path) starts with two slashes.

It should be perfectly valid to have a slash right after another, and there is no reason to "normalize" a sequence of slashes into a single slash, which uri actually does in merging paths:

URI.parse('http://example.com/foo//bar/')+'.'
=> #<URI::HTTP:0x0000080303d2b0 URL:http://example.com/foo/bar/>

Fixing this may be as easy as changing the regexp in URI::Generic#split_path from %r{/+} to %r{/}, but I wonder how the impact of incompatibility it may introduce would be.

Files

0001-Allow-empty-path-components-in-a-URI-Bug-8352.patch (2.3 KB) 0001-Allow-empty-path-components-in-a-URI-Bug-8352.patch

knu (Akinori MUSHA), 12/12/2017 05:45 AM

Related issues 2 (0 open — 2 closed)

Updated by knu (Akinori MUSHA) about 13 years ago Actions
Copy link
#1 [ruby-core:54730]

s/RFC 2896/RFC 2396/

Updated by naruse (Yui NARUSE) over 11 years ago Actions
Copy link
#2 [ruby-core:66033]

Description updated (diff)

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#3

Subject changed from uri squeezes a sequence of slashes in merging paths when it shouldn't to URI squeezes a sequence of slashes in merging paths when it shouldn't
Description updated (diff)
Backport deleted (~~1.9.3: UNKNOWN, 2.0.0: UNKNOWN~~)

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#4 [ruby-core:83784]

Addressable::URI (of the addressable gem) properly preserves sequences of slashes in a path, so it is a workaround to use it instead.

I've confirmed that net/url of Go, URI of Perl, urlparse.urljoin of Python2 or java.net.URL of Java never does this kind of unwanted normalization.

A single exception I could find, however, was urllib.parse of Python3. (!)

% python3
Python 3.6.3 (default, Nov  4 2017, 01:15:26)
[GCC 4.2.1 Compatible FreeBSD Clang 3.8.0 (tags/RELEASE_380/final 262564)] on freebsd11
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import urljoin
>>> urljoin('http://example.com/foo//bar/baz', '.')
'http://example.com/foo/bar/'

I'm not sure if this is an intentional change from Python2, but I believe any slash in the path part should be retained.

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#5 [ruby-core:83785]

I've also checked the url module of node.js and it didn't, neither. Their test cases do not include explicit examples of how to deal with sequences of slashes in a path, but there are some occurrences of double-slash retained in the expected results of relative path resolution, which means double-slash is not a subject of squeezing.

Looking into WHATWG URL spec, there's no indication that a sequence of slashes in a URL path should be treated specially. A path is simply a "list" of "items" separated with the slash (/, U+002F) and any item can naturally be an empty string. Even when resolving a "double-dot segment" and consequently "removing" a path "item" you are never told to "remove" extra items that are empty.

So, as you can see, Ruby and Python3 are the only exceptions, there's no specification that indicates that a sequence of slashes in a URL path should be treated specially, and the majority of library implementations found in other languages supports that. I presume there are few programmers who would rely on the current behavior.

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#6 [ruby-core:83861]

knu (Akinori MUSHA) wrote:

I presume there are few programmers who would rely on the current behavior.

I agree that there should be few programmers who would rely on subsequent slashes to be collapsed to a single slash. However, I also think it's a bad idea for programmers or users to rely on multiple consecutive slashes to be preserved. Using multiple consecutive slashes in an URI is a bad idea.

Updated by phluid61 (Matthew Kerwin) over 8 years ago Actions
Copy link
#7 [ruby-core:83862]

duerst (Martin Dürst) wrote:

Using multiple consecutive slashes in an URI is a bad idea.

It definitely doesn't play nicely with dot-segment resolution, but then I wouldn't want to bear the burden of deciding how to resolve that, one way or the other.

In this particular case, I think it is incorrect to automatically remove empty segments, but I also think it's bad to have them in the first place.

What if there was a way for the programmer to explicitly invoke the current behaviour (e.g. by sending a different message), so the side-effect is expected?

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#8 [ruby-core:84179]

File 0001-Allow-empty-path-components-in-a-URI-Bug-8352.patch 0001-Allow-empty-path-components-in-a-URI-Bug-8352.patch added
Assignee changed from akira (akira yamada) to naruse (Yui NARUSE)

Naruse-san, could you review the attached patch?

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#9

Target version set to 2.5

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#10

Status changed from Open to Closed

Applied in changeset trunk|r61218.

Allow empty path components in a URI [Bug #8352]

generic.rb (URI::Generic#merge, URI::Generic#route_to): Fix a bug
where a sequence of slashes in the path part gets collapsed to a
single slash. According to the relevant RFCs and WHATWG URL
Standard, empty path components are simply valid and there is no
special treatment defined for them, so we just keep them as they
are.

Updated by jeremyevans0 (Jeremy Evans) about 7 years ago Actions
Copy link
#11

Has duplicate Bug #12562: URI merge removes empty segment contrary to RFC 3986 added

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #8352

URI squeezes a sequence of slashes in merging paths when it shouldn't

Updated by knu (Akinori MUSHA) about 13 years ago Actions
Copy link
#1 [ruby-core:54730]

Updated by naruse (Yui NARUSE) over 11 years ago Actions
Copy link
#2 [ruby-core:66033]

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#3

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#4 [ruby-core:83784]

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#5 [ruby-core:83785]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#6 [ruby-core:83861]

Updated by phluid61 (Matthew Kerwin) over 8 years ago Actions
Copy link
#7 [ruby-core:83862]

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#8 [ruby-core:84179]

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#9

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#10

Updated by jeremyevans0 (Jeremy Evans) about 7 years ago Actions
Copy link
#11

Project

General

Profile

Ruby

Custom queries

Bug #8352

URI squeezes a sequence of slashes in merging paths when it shouldn't

Updated by knu (Akinori MUSHA) about 13 years ago ActionsCopy link #1 [ruby-core:54730]

Updated by naruse (Yui NARUSE) over 11 years ago ActionsCopy link #2 [ruby-core:66033]

Updated by knu (Akinori MUSHA) over 8 years ago ActionsCopy link #3

Updated by knu (Akinori MUSHA) over 8 years ago ActionsCopy link #4 [ruby-core:83784]

Updated by knu (Akinori MUSHA) over 8 years ago ActionsCopy link #5 [ruby-core:83785]

Updated by duerst (Martin Dürst) over 8 years ago ActionsCopy link #6 [ruby-core:83861]

Updated by phluid61 (Matthew Kerwin) over 8 years ago ActionsCopy link #7 [ruby-core:83862]

Updated by knu (Akinori MUSHA) over 8 years ago ActionsCopy link #8 [ruby-core:84179]

Updated by knu (Akinori MUSHA) over 8 years ago ActionsCopy link #9

Updated by knu (Akinori MUSHA) over 8 years ago ActionsCopy link #10

Updated by jeremyevans0 (Jeremy Evans) about 7 years ago ActionsCopy link #11

Updated by knu (Akinori MUSHA) about 13 years ago Actions
Copy link
#1 [ruby-core:54730]

Updated by naruse (Yui NARUSE) over 11 years ago Actions
Copy link
#2 [ruby-core:66033]

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#3

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#4 [ruby-core:83784]

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#5 [ruby-core:83785]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#6 [ruby-core:83861]

Updated by phluid61 (Matthew Kerwin) over 8 years ago Actions
Copy link
#7 [ruby-core:83862]

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#8 [ruby-core:84179]

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#9

Updated by knu (Akinori MUSHA) over 8 years ago Actions
Copy link
#10

Updated by jeremyevans0 (Jeremy Evans) about 7 years ago Actions
Copy link
#11