Bug #16151
closed[PATCH] Fix a class of fstring related use-after-free
Added by alanwu (Alan Wu) about 5 years ago. Updated about 5 years ago.
Description
Pull request: https://github.com/ruby/ruby/pull/2435
The bug¶
Run the following against master(e9bc8b3) to observe use-after-free:
-('a' * 30).force_encoding(Encoding::ASCII)
a = ('a' * 30).force_encoding(Encoding::ASCII).taint
t = Thread.new{}
t.name = a
eval('', binding, t.name)
p a
-('a' * 30).force_encoding(Encoding::ASCII)
a = ('a' * 30).force_encoding(Encoding::ASCII).taint
require 'ripper'
ripper = Ripper.new("", a)
eval('', binding, ripper.filename)
p a
There may be other cases in the standard library or in the wild.
Background¶
When a string has both STR_NOEMBED
and STR_SHARED
set, it relies on a
different string for its buffer. I will refer to strings that are depended upon
as "shared roots". Shared roots are frozen and have the STR_SHARED
unset.
This is a bit unintuitive to me. A name for STR_SHARED
that makes more sense
to me would be STR_BUFFER_ELSEWHERE
.
What went wrong¶
It is not safe to free the buffer of a shared root while it has dependants. The
root and its dependants use the same buffer. As such, it is only safe to free
the shared buffer when all users are unreachable on the heap.
The Fix¶
rb_fstring
has a code path that frees and replaces the buffer of its input.
Using this code path on the shared root of dependant strings sets up
use-after-free. This patch removes the problematic code path as no tests
require said buffer replacement functionality. Additionally, there has been
three other issues that steam from this particular code path. See #15926,
#15916 and #16136
I used @mame's commit in #16136 as the starting point for this investigation.
Thank you!
Updated by mame (Yusuke Endoh) about 5 years ago
Hi @alanwu (Alan Wu) , thank you for taking time to this issue.
I may be wrong, but I think that the code path you pointed is benign itself. It calls str_replace_shared_without_enc
which may free the string buffer. However, it does not free the buffer if it is STR_NOFREE. I guess that we need to set STR_NOFREE flag somewhere, though I cannot understand at all what is STR_NOFREE.
Looks like str_new_frozen
creates a broken shared pair.
https://github.com/ruby/ruby/blob/e9bc8b35c6c860e227627e3b0aab86b4f51c59af/string.c#L1303-L1312
It allocates str
, copies the contents from orig
to str
, make the buffer shared, but does not set the STR_NOFREE flag to str
.
The following patch fixes the two examples you showed. But it would produce a new memory-leak issue, I guess. @nobu (Nobuyoshi Nakada), what do you think?
diff --git a/string.c b/string.c
index 05ce0ed8d6..d810f252e7 100644
--- a/string.c
+++ b/string.c
@@ -1305,7 +1305,7 @@ str_new_frozen(VALUE klass, VALUE orig)
RSTRING(str)->as.heap.len = RSTRING_LEN(orig);
RSTRING(str)->as.heap.ptr = RSTRING_PTR(orig);
RSTRING(str)->as.heap.aux.capa = RSTRING(orig)->as.heap.aux.capa;
- RBASIC(str)->flags |= RBASIC(orig)->flags & STR_NOFREE;
+ RBASIC(str)->flags |= STR_NOFREE;
RBASIC(orig)->flags &= ~STR_NOFREE;
STR_SET_SHARED(orig, str);
if (klass == 0)
Honestly, fstring looks ill-designed to me.
Updated by mame (Yusuke Endoh) about 5 years ago
In addition to STR_NOFREE
, there are more two things that I cannot understand about string representation.
- What relation is between
STR_NOFREE
andSTR_IS_SHARED_M
?STR_IS_SHARED_M
seems to be the flag for a shared root, butstr_new_frozen
does not try to set the flag. - I cannot see the relation between the sharing mechanism and a hidden string (klass = 0).
At least, the invariant must be documented.
Updated by alanwu (Alan Wu) about 5 years ago
My reading is that STR_NOFREE
exists solely for strings that point to C string literals. It exists to prevent free("literal")
which is UB.
There are checks for STR_NOFREE
everywhere buffers are freed in case a static string shows up.
For heap allocated buffers, someone will eventually have to free them, so setting STR_NOFREE
on strings with heap buffers surely leads to a leak.
The only way to find out whether a string is a shared root is by traversing the entire heap. This is problematic for the call
to str_replace_shared_without_enc
in rb_fstring
as it doesn't have enough information to decide whether to free a string or not. Not freeing anything,
like we did before #15916, leads to leaks. Freeing with limited information leads to use-after-free situations.
The semantics for STR_IS_SHARED_M
is a bit weird to me too. I think it's a mechanism for detecting whether it's safe to free a shared root.
When rb_str_tmp_frozen_acquire
returns a shared root, the root has STR_IS_SHARED_M
set in all situations where it's unsafe for rb_str_tmp_frozen_release
to recycle the root. It's not quite "a root that has multiple dependants" since you could have a root that only has one dependant, but has
STR_IS_SHARED_M
set anyways. This could happen through this code path: https://github.com/ruby/ruby/blob/e9bc8b35c6c860e227627e3b0aab86b4f51c59af/string.c#L1291
When rb_str_tmp_frozen_release
detects that it's safe to recycle a shared root, it undos the sharing. This is done by basically doing the reverse
of https://github.com/ruby/ruby/blob/e9bc8b35c6c860e227627e3b0aab86b4f51c59af/string.c#L1303-L1312. I think this is why STR_NOFREE
is involved there.
My hunch is that there are ways to simplify away STR_IS_SHARED_M
without compromising on performance. It feels unnecessarily complicated to me.
Luckily STR_IS_SHARED_M
doesn't come in to play for Strings one could easily create in Ruby land...
I agree that there should be more documentation around the String's internal states and invariants. It would also be great if we could also rename some
struct members and variables. For example, STR_SHARED
's name is particularly unintuitive to me. A shared root is "shared", but it should not have STR_SHARED
set.
Updated by nobu (Nobuyoshi Nakada) about 5 years ago
Probably, STR_IS_SHARED_M
string must not get its buffer replaced.
And I don't think that memory leaks, resulted by setting STR_NOFREE
regardless the original string, is acceptable.
Although I'm not sure why it is restricted to klass == 0
strings, this is a rough patch.
diff --git a/string.c b/string.c
index 05ce0ed8d6..2a99d10f6f 100644
--- a/string.c
+++ b/string.c
@@ -155,8 +155,7 @@ VALUE rb_cSymbol;
if (!FL_TEST(str, STR_FAKESTR)) { \
RB_OBJ_WRITE((str), &RSTRING(str)->as.heap.aux.shared, (shared_str)); \
FL_SET((str), STR_SHARED); \
- if (RBASIC_CLASS((shared_str)) == 0) /* for CoW-friendliness */ \
- FL_SET_RAW((shared_str), STR_IS_SHARED_M); \
+ FL_SET_RAW((shared_str), STR_IS_SHARED_M); \
} \
} while (0)
@@ -327,6 +326,7 @@ rb_fstring(VALUE str)
fstr = register_fstring(str);
if (!bare) {
+ if (FL_TEST_RAW(str, STR_IS_SHARED_M)) return str;
str_replace_shared_without_enc(str, fstr);
OBJ_FREEZE_RAW(str);
return str;
@@ -1177,6 +1177,9 @@ str_replace_shared_without_enc(VALUE str2, VALUE str)
/* TODO: check if str2 is a shared root */
char *ptr2 = STR_HEAP_PTR(str2);
if (ptr2 != ptr) {
+ if (FL_TEST_RAW(str2, STR_IS_SHARED_M)) {
+ rb_fatal("about to replace a shared root");
+ }
ruby_sized_xfree(ptr2, STR_HEAP_SIZE(str2));
}
}
@@ -1287,8 +1290,7 @@ str_new_frozen(VALUE klass, VALUE orig)
RSTRING(str)->as.heap.len -= ofs + rest;
}
else {
- if (RBASIC_CLASS(shared) == 0)
- FL_SET_RAW(shared, STR_IS_SHARED_M);
+ FL_SET_RAW(shared, STR_IS_SHARED_M);
return shared;
}
}
@@ -1308,8 +1310,6 @@ str_new_frozen(VALUE klass, VALUE orig)
RBASIC(str)->flags |= RBASIC(orig)->flags & STR_NOFREE;
RBASIC(orig)->flags &= ~STR_NOFREE;
STR_SET_SHARED(orig, str);
- if (klass == 0)
- FL_UNSET_RAW(str, STR_IS_SHARED_M);
}
}
P.S.
def test_shared_string_safety
-('a' * 30).force_encoding(Encoding::ASCII)
str = ('a' * 30).force_encoding(Encoding::ASCII).taint
frozen_str = Bug::String.rb_str_new_frozen(str)
assert_fstring(frozen_str) {|s| assert_equal(str, s)}
GC.start ###<-- I needed this line to reproduce the failure.
assert_equal('a' * 30, str, "[Bug #16151]")
end
Updated by alanwu (Alan Wu) about 5 years ago
That general approach sounds good to me, but please let me mention some caveats.
@@ -1308,8 +1310,6 @@ str_new_frozen(VALUE klass, VALUE orig)
RBASIC(str)->flags |= RBASIC(orig)->flags & STR_NOFREE;
RBASIC(orig)->flags &= ~STR_NOFREE;
STR_SET_SHARED(orig, str);
- if (klass == 0)
- FL_UNSET_RAW(str, STR_IS_SHARED_M);
}
}
These two lines are important for #13085, where STR_IS_SHARED_M
was first
introduced. The following program generates a lot more garbage when I comment
out these two lines:
# cat.rb
buf = ''.b
File.open(ARGV.first) do |f|
while f.read(16384, buf)
$stdout.write(buf)
end
end
$stderr.puts(File.read("/proc/#{Process.pid}/status"))
Going with this approach while keeping the optimization introduced in #13085
might require a new flag.
Sidenote, marking roots with a flag is a heuristic, since a string stops being
a root once all the dependants are unreachable. We already have a more coarse
heuristic: if a string is frozen, we know that it could be a shared root.
Maybe that can be a viable alternative approach.
diff --git a/string.c b/string.c
index 05ce0ed8d6..86db4891fb 100644
--- a/string.c
+++ b/string.c
@@ -327,6 +327,8 @@ rb_fstring(VALUE str)
fstr = register_fstring(str);
if (!bare) {
+ /* A frozen string might be a shared root */
+ if (FL_TEST_RAW(str, FL_FREEZE)) return str;
str_replace_shared_without_enc(str, fstr);
OBJ_FREEZE_RAW(str);
return str;
Though, I think if we can get away with removing the buffer deduplication/replacement
code path, we should.
Updated by mame (Yusuke Endoh) about 5 years ago
Just FYI: You said in https://bugs.ruby-lang.org/issues/16152#note-8
- What do you think about the proposed schedule?
- I am asking because this might inform the fix for [Bug #16151]
But, BARE_STRING_P is not only for a tainted string, but also for a string that includes an instance variable.
s = "foo"
s.instance_variable_set(:@foo, 1)
So removing $SAFE does not solve the difficulty completely. Sorry if you already know.
Updated by alanwu (Alan Wu) about 5 years ago
Thanks for pointing that out. I should clarify what I meant there. In a world where strings don't track taint status, it might make more sense to refuse buffer deduplication for non bare strings.
At that point, my original patch only refuses strings that use a subclass or have ivars. Those strings might be rare enough that removing the buffer duplication code is acceptable.
Updated by mame (Yusuke Endoh) about 5 years ago
Indeed, it makes sense. Thank you.
I am not familiar with Rails at all so I could be wrong, but I hear ActiveSupport::SafeBuffer inherits String and has an instance variable. Is this class not so used frequently?
https://api.rubyonrails.org/classes/ActiveSupport/SafeBuffer.html
Updated by alanwu (Alan Wu) about 5 years ago
I think that class is used frequently, though I'm not sure if Rails tries to deduplicate instances with fstrings. Regardless, it seems that removing
buffer deduplication altogether isn't very workable, since we need to backport it, even if it's acceptable in a version that doesn't track taint.
I think refusing to do buffer deduplicate on frozen strings seems promising. I will try to get some statistics on how many strings it ends up refusing.
Updated by alanwu (Alan Wu) about 5 years ago
I ran some quick tests on rejecting frozen non-bare strings from the buffer
deduplication code path. On Discourse, about 18% of calls to rb_fstring are
refused with the change in https://bugs.ruby-lang.org/issues/16151#note-5.
So I don't think that's a viable approach.
I've put together a PR with nobu's idea, using a new flag to make sure to
not defeat the optimization in #13085. I also took the liberty to rename
STR_IS_SHARED_M since it's a little confusing to have this many things talk
about sharing.
Updated by alanwu (Alan Wu) about 5 years ago
- Status changed from Open to Closed
Applied in changeset git|93faa011d393bb4b5cf31a0cbb46922f0a5e7cdc.
Tag string shared roots to fix use-after-free
The buffer deduplication codepath in rb_fstring can be used to free the buffer
of shared string roots, which leads to use-after-free.
Introudce a new flag to tag strings that at one point have been a shared root.
Check for it in rb_fstring to avoid freeing buffers that are shared by
multiple strings. This change is based on nobu's idea in [ruby-core:94838].
The included test case test for the sequence of calls to internal functions
that lead to this bug. See attached ticket for Ruby level repros.
[Bug #16151]
Updated by nagachika (Tomoyuki Chikanaga) about 5 years ago
- Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN to 2.5: UNKNOWN, 2.6: REQUIRED
Maybe 2.6.4 has same issue.
Updated by nagachika (Tomoyuki Chikanaga) about 5 years ago
- Backport changed from 2.5: UNKNOWN, 2.6: REQUIRED to 2.5: UNKNOWN, 2.6: DONE
ruby_2_6 r67804 merged revision(s) 93faa011d393bb4b5cf31a0cbb46922f0a5e7cdc.