Project

General

Profile

Actions

Bug #11759

closed

URI breaks with frozen strings

Added by mperham (Mike Perham) over 8 years ago. Updated over 8 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
2.3.0-preview1
[ruby-core:71785]

Description

It appears URI uses String mutation and breaks frozen string mode.

$ RUBYOPT="--enable-frozen-string-literal" bundle exec rake
/Users/mike/.rubies/ruby-2.3.0-preview1/lib/ruby/2.3.0/uri/generic.rb:1344:in `to_s': can't modify frozen String (RuntimeError)
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:257:in `normalize_uri'
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:198:in `add_remote'
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:25:in `block in initialize'
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:25:in `reverse_each'
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:25:in `initialize'

Files

0001-Do-not-mutate-strings-in-URI-to_s.patch (1.76 KB) 0001-Do-not-mutate-strings-in-URI-to_s.patch davidcelis (David Celis), 12/02/2015 01:30 AM
11759.patch (610 Bytes) 11759.patch colindkelley (Colin Kelley), 12/02/2015 07:08 AM
11759-rev2.patch (3.21 KB) 11759-rev2.patch colindkelley (Colin Kelley), 12/05/2015 04:15 PM
11759-rev3.patch (3.21 KB) 11759-rev3.patch colindkelley (Colin Kelley), 12/07/2015 03:57 PM

Updated by davidcelis (David Celis) over 8 years ago

  • File 0001-Do-not-mutate-strings-in-URI-to_s.patch added

I've attached a patch you can try out. It will avoid string mutation in URI#to_s, though I'm not sure it's optimal to generate so many new strings in this way. I'm happy to edit as necessary.

Updated by davidcelis (David Celis) over 8 years ago

I just realized that the patch contained many cosmetic updates, so I reverted them to maintain the existing style. Sorry about that.

Actions #3

Updated by davidcelis (David Celis) over 8 years ago

  • File deleted (0001-Do-not-mutate-strings-in-URI-to_s.patch)

Updated by normalperson (Eric Wong) over 8 years ago

Hi David, a new problem is to_s now returns a frozen string.
This has the potential to break many existing callers of to_s.

Perhaps the following (untested) one-liner is enough?

--- a/lib/uri/generic.rb
+++ b/lib/uri/generic.rb
@@ -1339,7 +1339,7 @@ def normalize!
     # Constructs String from URI
     #
     def to_s
-      str = ''
+      str = ''.freeze.dup
       if @scheme
         str << @scheme
         str << ':'.freeze

I added the extra 'freeze' instead of just calling 'dup' to avoid
allocating the extra object when frozen string literals are not
enabled.

But yeah, problems like this is why I remain against frozen string
literals for 3.0 (but I'm alright with the opt-in magic comment).

Updated by davidcelis (David Celis) over 8 years ago

Hey Eric, thanks for the alternative! I was under the impression that we should assume strings to be frozen in this mode but if that's not the case I'd be happy to update the attached patch.

Updated by akr (Akira Tanaka) over 8 years ago

Don't mix optimization and incompatible change.
Incompatible change should be separated to another feature ticket.

Don't assume global "mode" for frozen-string-literal: true/false.
This comment is specified for each file.
So, what library uses frozen-string-literal:true doesn't mean
applications uses frozen-string-literal:true.

Updated by colindkelley (Colin Kelley) over 8 years ago

Isn't it sufficient to initialize the string buffer with String.new?

I've attached a patch that also includes the magic comment to indicate that this file has been converted.

Updated by normalperson (Eric Wong) over 8 years ago

wrote:

Isn't it sufficient to initialize the string buffer with String.new?

Yes, but I prefer to avoid String.new because the constant lookup
requires an inline cache lookup + storage entry in the iseq.

Here's their respective disassembly code:

''.freeze.dup
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 opt_str_freeze   ""
0004 opt_send_without_block <callinfo!mid:dup, argc:0, ARGS_SIMPLE>, <callcache>
0007 leave
String.new
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 getinlinecache   9, <is:0>
0005 getconstant      :String
0007 setinlinecache   <is:0>
0009 opt_send_without_block <callinfo!mid:new, argc:0, ARGS_SIMPLE>, <callcache>
0012 leave

But maybe String.new is slightly faster; but I normally
prefer smaller code unless something is called in a tight loop.

Updated by colindkelley (Colin Kelley) over 8 years ago

Yes, but I prefer to avoid String.new because the constant lookup
requires an inline cache lookup + storage entry in the iseq.

Compared to the surrounding code in the full method, would the extra constant lookup make a measurable difference in code size?

But maybe String.new is slightly faster; but I normally
prefer smaller code unless something is called in a tight loop.

If it's the same speed or faster, I would vote for String.new because to me it makes the intention the most clear. This seems ugly:

''.freeze.dup

because the .freeze is temporary for Ruby 2.1 and 2.2, right? When would this copy of generic.rb ever be run with Ruby versions earlier than 2.3? Assuming it will just be used for Ruby 2.3 and later, the magic comment included in the patch will implicitly freeze the string literal. Hence this would also be sufficient (and in my opinion, nearly as clear in intention as String.new):

''.dup

Updated by normalperson (Eric Wong) over 8 years ago

wrote:

Compared to the surrounding code in the full method, would the extra
constant lookup make a measurable difference in code size?

64 bytes on ruby 2.3.0dev (2015-12-03 trunk 52872) [x86_64-linux].

require 'objspace'
iseq = RubyVM::InstructionSequence
p ObjectSpace.memsize_of(iseq.compile("str = String.new"))
p ObjectSpace.memsize_of(iseq.compile("str = ''.dup"))
p ObjectSpace.memsize_of(iseq.compile("str = ''.freeze.dup"))
Outputs:
480
416
416

because the .freeze is temporary for Ruby 2.1 and 2.2, right? When
would this copy of generic.rb ever be run with Ruby versions earlier
than 2.3?

No, we shouldn't worry about performance with older versions of Ruby
with the stdlib.

Assuming it will just be used for Ruby 2.3 and later, the
magic comment included in the patch will implicitly freeze the string
literal. Hence this would also be sufficient (and in my opinion,
nearly as clear in intention as String.new):

''.dup

I prefer that if we go for "frozen_string_literal: true" in the comment.

I've also added this test for to_s. However, checking more closely,
the "path" accessor also gets frozen changes because we call
set_path with literal strings.

There may be code out in the wild which relies on "path" being mutable.

--- a/test/uri/test_generic.rb
+++ b/test/uri/test_generic.rb
@@ -14,6 +14,13 @@ def uri_to_ary(uri)
     uri.class.component.collect {|c| uri.send(c)}
   end
 
+  def test_to_s
+    exp = 'http://example.com/'.freeze
+    str = URI(exp).to_s
+    assert_equal exp, str
+    refute_predicate str, :frozen?, '[ruby-core:71785] [Bug #11759]'
+  end
+
   def test_parse
     # 0
     assert_kind_of(URI::HTTP, @base_url)

... So perhaps at least one additional change is needed:

--- a/lib/uri/generic.rb
+++ b/lib/uri/generic.rb
@@ -786,7 +786,7 @@ def check_path(v)
     # see also URI::Generic.path=
     #
     def set_path(v)
-      @path = v
+      @path = v.frozen? ? v.dup : v
     end
     protected :set_path
 

All these frozen literal changes will require going every single
method and and call site with a very fine tooth comb....

Updated by colindkelley (Colin Kelley) over 8 years ago

Outputs:
480
416
416

That is more significant than I thought. ''.dup wins.

No, we shouldn't worry about performance with older versions of Ruby
with the stdlib.

Good to have that confirmed. Then we should definitely leave the magic comment at the top for these reasons:

a) It documents which files have been visited in the process of converting to immutable string literals.
b) It allows us to immediately remove the clutter of .freeze at the end of string literals

I've attached an updated patch with the above changes.

There may be code out in the wild which relies on "path" being mutable.

I don't think we have to support that. Callers should not expect to be able to mutate object internals retrieved by accessors. If they try and a "can't modify frozen String" exception is raised, they can fix their code to dup the value.

Updated by matz (Yukihiro Matsumoto) over 8 years ago

I prefer String.new() to "".dup because the former describes intention (of creating a buffer).
In fact, my best choice is introducing String#+ that returns a mutable copy of a string.

For the size of byte code and performance, we can expect future optimization.

Matz.

Updated by colindkelley (Colin Kelley) over 8 years ago

I prefer String.new() to "".dup because the former describes intention (of creating a buffer).

Ok. I've attached a rev3 patch that uses String.new instead of ''.dup.

In fact, my best choice is introducing String#+ that returns a mutable copy of a string.

I had a question about that but I saw it answered elsewhere. My understanding is that that would be a unary + operator on String, equivalent to:

class String
  def +@
    self.frozen? ? self.dup : self
  end
end
Actions #14

Updated by Anonymous over 8 years ago

  • Status changed from Open to Closed

Applied in changeset r52981.


lib/uri/generic.rb: enable frozen_string_literal

  • lib/uri/generic.rb: enable frozen_string_literal
    (split_userinfo): remove explicit .freeze for string literals
    (check_path): ditto
    (query): ditto
    (fragment): ditto
    (to_s): ditto
    [ruby-core:71910] [Bug #11759]

Patch-by: Colin Kelley

Updated by normalperson (Eric Wong) over 8 years ago

wrote:

I prefer String.new() to "".dup because the former describes intention (of creating a buffer).

Ok. I've attached a rev3 patch that uses String.new instead of ''.dup.

Thanks. Committed as r52981

In fact, my best choice is introducing String#+ that returns a mutable copy of a string.

How would that be different from the current binary string operator String#+ that does string concatenation?

It actually calls the "+@" method as implemented in r52917 / [Feature #11782]
But according to [ruby-core:71924], maybe it's not a good idea...

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0