https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17097754782016-10-18T21:18:40ZRuby Issue Tracking SystemRuby master - Bug #12852: URI.parse can't handle non-ascii URIshttps://redmine.ruby-lang.org/issues/12852?journal_id=609392016-10-18T21:18:40Zphluid61 (Matthew Kerwin)matthew@kerwin.net.au
<ul></ul><p>As a point of order, there's no such thing as a "non-ASCII URI"*; that would be an <a href="https://tools.ietf.org/html/rfc3987" class="external">IRI</a>.</p>
<p>The rails snippet you linked is part of a HTML form. A web browser displaying and submitting that form would interpret the <code>&#x2713;</code> entity as U+2713 CHECK MARK, yes, but it would percent-encode it as <code>%E2%9C%93</code> before using it in a HTTP request, because HTTP uses URIs, not IRIs. (The browser may present it as a single Unicode character in the awesomebar/omnibar/address bar, but that's a UI presentation element and not a true and accurate display of the URI.)</p>
<p>* see <a href="https://tools.ietf.org/html/rfc3986#section-2" class="external">RFC 3296, Section 2</a></p> Ruby master - Bug #12852: URI.parse can't handle non-ascii URIshttps://redmine.ruby-lang.org/issues/12852?journal_id=609402016-10-18T21:57:30Zolivierlacan (Olivier Lacan)hi@olivierlacan.com
<ul></ul><p>Matthew Kerwin wrote:</p>
<blockquote>
<p>The rails snippet you linked is part of a HTML form. A web browser displaying and submitting that form would interpret the <code>&#x2713;</code> entity as U+2713 CHECK MARK, yes, but it would percent-encode it as <code>%E2%9C%93</code> before using it in a HTTP request, because HTTP uses URIs, not IRIs. (The browser may present it as a single Unicode character in the awesomebar/omnibar/address bar, but that's a UI presentation element and not a true and accurate display of the URI.)</p>
</blockquote>
<p>It's common for OAuth authentication flows to store a destination URI to return to when the handshake process is completed. This URI can be stored without first being processed by a web sever that will encode it in the way Rails does for submitted forms since it's not meant to be processed — that is until it comes back to the origin server.</p>
<p>I opened this due to an issue I encountered in an OAuth provider handshake procedure. You could argue that I should be expected to <code>URI.encode</code> any URI set as a destination query parameter to prevent this issue from occurring, surely.</p>
<p>Do you not agree that URI.parse should accept unicode entities in URIs? It wasn't clear from your response.</p>
<p>I'm not aware of any IRI-compatible API in MRI that could allow me to directly parse URIs containing non-ASCII characters with Ruby, whether they match the strict definition of a URI or not.</p> Ruby master - Bug #12852: URI.parse can't handle non-ascii URIshttps://redmine.ruby-lang.org/issues/12852?journal_id=609412016-10-18T23:27:11Zphluid61 (Matthew Kerwin)matthew@kerwin.net.au
<ul></ul><p>Olivier Lacan wrote:</p>
<blockquote>
<p>It's common for OAuth authentication flows to store a destination URI to return to when the handshake process is completed. This URI can be stored without first being processed by a web sever that will encode it in the way Rails does for submitted forms since it's not meant to be processed — that is until it comes back to the origin server.</p>
</blockquote>
<p>You keep using the word "URI" to refer to these data objects, and by specification a URI data object cannot contain non-ASCII characters (and even some ASCII characters are forbidden.) If we agree that <code>"haha\nlol"</code> is a String that cannot be parsed as a URI, we should agree the same for <code>"http://example.org/\u{2713}".force_encoding('UTF-8')</code></p>
<blockquote>
<p>I opened this due to an issue I encountered in an OAuth provider handshake procedure. You could argue that I should be expected to <code>URI.encode</code> any URI set as a destination query parameter to prevent this issue from occurring, surely.</p>
</blockquote>
<p>That's what I'm saying, but from the opposite direction. If you're storing a String, and want to ensure that it encodes a valid URI, you should <code>URI.encode</code> the parts before storing them in the String.</p>
<p>By analogy, if we replace "URI" with "JSON" in this discussion the same holds true: <code>"{\"foo\":\"\n\"}"</code> holds a String that looks a lot like JSON, but isn't valid [ <a href="https://tools.ietf.org/html/rfc7159#section-7" class="external">RFC 7159</a>], and <code>JSON.parse</code> correctly raises an exception on it.</p>
<p>If a network peer is sending you a message that includes bytes that look like a URI but with UTF-8-encoded Unicode characters and <em>not</em> ASCII-compatible percent-encoded octets (i.e. it's sending an IRI), then one of two things is happening:</p>
<ol>
<li>
<p>the protocol you're using is built on IRIs, not URIs, and you are responsible for any transformations to/from URIs (including <code>URI.encode</code>); or</p>
</li>
<li>
<p>the peer is in violation of a spec, and you should throw an error back at it. (In this case the specs are usually quiet clear on exactly what error to throw, too.)</p>
</li>
</ol>
<blockquote>
<p>Do you not agree that URI.parse should accept unicode entities in URIs? It wasn't clear from your response.</p>
</blockquote>
<p>I think <code>URI.parse</code> correctly raises an exception when it encounters characters that are forbidden by RFC3986.</p>
<blockquote>
<p>I'm not aware of any IRI-compatible API in MRI that could allow me to directly parse URIs containing non-ASCII characters with Ruby, whether they match the strict definition of a URI or not.</p>
</blockquote>
<p>Your thinking here seems confused. If a String contains non-ASCII characters then it's not a URI. If it is a URI then it strictly matches the definition of a URI. If a String contains a valid IRI, then yeah, you're not going to get much help from Ruby; but IRIs are not commonly used in the real world anyway.</p> Ruby master - Bug #12852: URI.parse can't handle non-ascii URIshttps://redmine.ruby-lang.org/issues/12852?journal_id=620072016-12-12T18:39:41Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>Matthew Kerwin wrote:</p>
<blockquote>
<p>Your thinking here seems confused. If a String contains non-ASCII characters then it's not a URI. If it is a URI then it strictly matches the definition of a URI. If a String contains a valid IRI, then yeah, you're not going to get much help from Ruby; but IRIs are not commonly used in the real world anyway.</p>
</blockquote>
<p>The concept sounds reasonable.<br>
And I'm considering URL Standard's parsing logic is more suitable for Ruby's URI.parse.<br>
<a href="https://url.spec.whatwg.org/" class="external">https://url.spec.whatwg.org/</a><br>
But the algorithm is still developing.</p> Ruby master - Bug #12852: URI.parse can't handle non-ascii URIshttps://redmine.ruby-lang.org/issues/12852?journal_id=999832022-11-07T17:41:54Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul>