Project

General

Profile

Actions

Bug #19916

closed

URI#to_s can serialize to a value that doesn't deserialize to the original

Added by yawboakye (yaw boakye) about 1 year ago. Updated 11 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:114985]

Description

It appears that when we're serializing a URI to string, we don't check/confirm that it is represented in a form that can be deserialized back into the original. I think it's fair to expect that serialize+deserialize produces an object that is the same as the original, only differing perhaps in terms of the unique object identifier. This isn't the case with URI when they are custom built, which might happen a lot, for example in a Rails app that accepts URL inputs from users. Let me attempt a reproduction, using the generic URI example.com.

example_url = "example.com"
example_uri = URI(example_url)

Given that no scheme is explicitly set in the URI, it is correctly parsed as generic, with the given example.com interpreted as the path.
The object returned to is mutable. Since we didn't automatically detect a scheme, let's fix that as well as the hostname.

example_uri.scheme = "https"
example_uri.hostname = example_uri.path

# I've intentionally left path value unchanged, since it helps demonstrate the potential bug.

Given that we have a scheme, an authority, and a path, and given that we format URI according to RFC 3986, one may expect that serializing the URI to string will follow the guidelines of section 3 of the RFC: Syntax Components, which requires a slash separator between the authority (in our case hostname) and the path. It appears that URI#to_s may not do that if path didn't already have a slash prefix. Which would be fine if we were keeping an invariant that ensured that we never produced bad serialized URI. To return to our example_uri, serialization produces:

serialized_uri = example_uri.to_s
puts serialized_uri # https://example.comexample.com

This is obviously bad. One would have expected https://example.com/example.com instead. That is, the slash will be automatically and correctly inserted, just as the double slashes were automatically inserted between the scheme and and the authority. serialized_uri cannot be deserialized into example_uri, in fact. Below is an attempt at deserialization and a comparison of the new value to the original:

deserialized_example_uri = URI(serialized_uri)
example_uri.scheme == deserialized_example_uri.scheme # true
example_uri.hostname == deserialized_example_uri.hostname # false (for, example.com =/= example.comexample.com)
example_uri.path == deserialized_example_uri.path # false (for, example.com =/= "")

I believe that the ability to serialize and deserialize an object without losing fidelity is a great thing. I believe even more strongly that we should preserve/maintain an invariant that allows us to always serialize a URI to a format that meets the RFC's specification. Therefore I consider this a bug, and I'd be willing to work on a fix, as my first contribution to Ruby, if enough people consider it a bug too.

Regards!


Files

Screenshot 2023-09-29 at 12.19.26.png (180 KB) Screenshot 2023-09-29 at 12.19.26.png yawboakye (yaw boakye), 10/09/2023 07:46 AM
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0