Bug #11759
closedURI breaks with frozen strings
Description
It appears URI uses String mutation and breaks frozen string mode.
$ RUBYOPT="--enable-frozen-string-literal" bundle exec rake
/Users/mike/.rubies/ruby-2.3.0-preview1/lib/ruby/2.3.0/uri/generic.rb:1344:in `to_s': can't modify frozen String (RuntimeError)
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:257:in `normalize_uri'
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:198:in `add_remote'
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:25:in `block in initialize'
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:25:in `reverse_each'
/Users/mike/.gem/ruby/2.3.0/gems/bundler-1.10.6/lib/bundler/source/rubygems.rb:25:in `initialize'
Files
Updated by davidcelis (David Celis) about 9 years ago
- File 0001-Do-not-mutate-strings-in-URI-to_s.patch added
I've attached a patch you can try out. It will avoid string mutation in URI#to_s, though I'm not sure it's optimal to generate so many new strings in this way. I'm happy to edit as necessary.
Updated by davidcelis (David Celis) about 9 years ago
- File 0001-Do-not-mutate-strings-in-URI-to_s.patch 0001-Do-not-mutate-strings-in-URI-to_s.patch added
I just realized that the patch contained many cosmetic updates, so I reverted them to maintain the existing style. Sorry about that.
Updated by davidcelis (David Celis) about 9 years ago
- File deleted (
0001-Do-not-mutate-strings-in-URI-to_s.patch)
Updated by normalperson (Eric Wong) about 9 years ago
Hi David, a new problem is to_s now returns a frozen string.
This has the potential to break many existing callers of to_s.
Perhaps the following (untested) one-liner is enough?
--- a/lib/uri/generic.rb
+++ b/lib/uri/generic.rb
@@ -1339,7 +1339,7 @@ def normalize!
# Constructs String from URI
#
def to_s
- str = ''
+ str = ''.freeze.dup
if @scheme
str << @scheme
str << ':'.freeze
I added the extra 'freeze' instead of just calling 'dup' to avoid
allocating the extra object when frozen string literals are not
enabled.
But yeah, problems like this is why I remain against frozen string
literals for 3.0 (but I'm alright with the opt-in magic comment).
Updated by davidcelis (David Celis) about 9 years ago
Hey Eric, thanks for the alternative! I was under the impression that we should assume strings to be frozen in this mode but if that's not the case I'd be happy to update the attached patch.
Updated by akr (Akira Tanaka) about 9 years ago
Don't mix optimization and incompatible change.
Incompatible change should be separated to another feature ticket.
Don't assume global "mode" for frozen-string-literal: true/false.
This comment is specified for each file.
So, what library uses frozen-string-literal:true doesn't mean
applications uses frozen-string-literal:true.
Updated by colindkelley (Colin Kelley) about 9 years ago
- File 11759.patch 11759.patch added
Isn't it sufficient to initialize the string buffer with String.new?
I've attached a patch that also includes the magic comment to indicate that this file has been converted.
Updated by normalperson (Eric Wong) about 9 years ago
colin@invoca.com wrote:
Isn't it sufficient to initialize the string buffer with String.new?
Yes, but I prefer to avoid String.new because the constant lookup
requires an inline cache lookup + storage entry in the iseq.
Here's their respective disassembly code:
''.freeze.dup
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace 1 ( 1)
0002 opt_str_freeze ""
0004 opt_send_without_block <callinfo!mid:dup, argc:0, ARGS_SIMPLE>, <callcache>
0007 leave
String.new
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace 1 ( 1)
0002 getinlinecache 9, <is:0>
0005 getconstant :String
0007 setinlinecache <is:0>
0009 opt_send_without_block <callinfo!mid:new, argc:0, ARGS_SIMPLE>, <callcache>
0012 leave
But maybe String.new
is slightly faster; but I normally
prefer smaller code unless something is called in a tight loop.
Updated by colindkelley (Colin Kelley) about 9 years ago
Yes, but I prefer to avoid String.new because the constant lookup
requires an inline cache lookup + storage entry in the iseq.
Compared to the surrounding code in the full method, would the extra constant lookup make a measurable difference in code size?
But maybe String.new is slightly faster; but I normally
prefer smaller code unless something is called in a tight loop.
If it's the same speed or faster, I would vote for String.new
because to me it makes the intention the most clear. This seems ugly:
''.freeze.dup
because the .freeze
is temporary for Ruby 2.1 and 2.2, right? When would this copy of generic.rb ever be run with Ruby versions earlier than 2.3? Assuming it will just be used for Ruby 2.3 and later, the magic comment included in the patch will implicitly freeze the string literal. Hence this would also be sufficient (and in my opinion, nearly as clear in intention as String.new
):
''.dup
Updated by normalperson (Eric Wong) about 9 years ago
colin@invoca.com wrote:
Compared to the surrounding code in the full method, would the extra
constant lookup make a measurable difference in code size?
64 bytes on ruby 2.3.0dev (2015-12-03 trunk 52872) [x86_64-linux].
require 'objspace'
iseq = RubyVM::InstructionSequence
p ObjectSpace.memsize_of(iseq.compile("str = String.new"))
p ObjectSpace.memsize_of(iseq.compile("str = ''.dup"))
p ObjectSpace.memsize_of(iseq.compile("str = ''.freeze.dup"))
Outputs:
480
416
416
because the
.freeze
is temporary for Ruby 2.1 and 2.2, right? When
would this copy of generic.rb ever be run with Ruby versions earlier
than 2.3?
No, we shouldn't worry about performance with older versions of Ruby
with the stdlib.
Assuming it will just be used for Ruby 2.3 and later, the
magic comment included in the patch will implicitly freeze the string
literal. Hence this would also be sufficient (and in my opinion,
nearly as clear in intention asString.new
):''.dup
I prefer that if we go for "frozen_string_literal: true" in the comment.
I've also added this test for to_s
. However, checking more closely,
the "path" accessor also gets frozen changes because we call
set_path
with literal strings.
There may be code out in the wild which relies on "path" being mutable.
--- a/test/uri/test_generic.rb
+++ b/test/uri/test_generic.rb
@@ -14,6 +14,13 @@ def uri_to_ary(uri)
uri.class.component.collect {|c| uri.send(c)}
end
+ def test_to_s
+ exp = 'http://example.com/'.freeze
+ str = URI(exp).to_s
+ assert_equal exp, str
+ refute_predicate str, :frozen?, '[ruby-core:71785] [Bug #11759]'
+ end
+
def test_parse
# 0
assert_kind_of(URI::HTTP, @base_url)
... So perhaps at least one additional change is needed:
--- a/lib/uri/generic.rb
+++ b/lib/uri/generic.rb
@@ -786,7 +786,7 @@ def check_path(v)
# see also URI::Generic.path=
#
def set_path(v)
- @path = v
+ @path = v.frozen? ? v.dup : v
end
protected :set_path
All these frozen literal changes will require going every single
method and and call site with a very fine tooth comb....
Updated by colindkelley (Colin Kelley) about 9 years ago
- File 11759-rev2.patch 11759-rev2.patch added
Outputs:
480
416
416
That is more significant than I thought. ''.dup
wins.
No, we shouldn't worry about performance with older versions of Ruby
with the stdlib.
Good to have that confirmed. Then we should definitely leave the magic comment at the top for these reasons:
a) It documents which files have been visited in the process of converting to immutable string literals.
b) It allows us to immediately remove the clutter of .freeze
at the end of string literals
I've attached an updated patch with the above changes.
There may be code out in the wild which relies on "path" being mutable.
I don't think we have to support that. Callers should not expect to be able to mutate object internals retrieved by accessors. If they try and a "can't modify frozen String" exception is raised, they can fix their code to dup
the value.
Updated by matz (Yukihiro Matsumoto) about 9 years ago
I prefer String.new()
to "".dup
because the former describes intention (of creating a buffer).
In fact, my best choice is introducing String#+
that returns a mutable copy of a string.
For the size of byte code and performance, we can expect future optimization.
Matz.
Updated by colindkelley (Colin Kelley) about 9 years ago
- File 11759-rev3.patch 11759-rev3.patch added
I prefer String.new() to "".dup because the former describes intention (of creating a buffer).
Ok. I've attached a rev3 patch that uses String.new
instead of ''.dup
.
In fact, my best choice is introducing String#+ that returns a mutable copy of a string.
I had a question about that but I saw it answered elsewhere. My understanding is that that would be a unary +
operator on String, equivalent to:
class String
def +@
self.frozen? ? self.dup : self
end
end
Updated by Anonymous about 9 years ago
- Status changed from Open to Closed
Applied in changeset r52981.
lib/uri/generic.rb: enable frozen_string_literal
- lib/uri/generic.rb: enable frozen_string_literal
(split_userinfo): remove explicit .freeze for string literals
(check_path): ditto
(query): ditto
(fragment): ditto
(to_s): ditto
[ruby-core:71910] [Bug #11759]
Patch-by: Colin Kelley colindkelley@gmail.com
Updated by normalperson (Eric Wong) about 9 years ago
colin@invoca.com wrote:
I prefer String.new() to "".dup because the former describes intention (of creating a buffer).
Ok. I've attached a rev3 patch that uses
String.new
instead of''.dup
.
Thanks. Committed as r52981
In fact, my best choice is introducing String#+ that returns a mutable copy of a string.
How would that be different from the current binary string operator
String#+
that does string concatenation?
It actually calls the "+@" method as implemented in r52917 / [Feature #11782]
But according to [ruby-core:71924], maybe it's not a good idea...