Project

General

Profile

Actions

Bug #13085

closed

io.c io_fwrite creates garbage

Added by normalperson (Eric Wong) about 7 years ago. Updated about 7 years ago.

Status:
Closed
Target version:
-
[ruby-core:78898]

Description

Relying on rb_str_new_frozen for unconverted strings does not
save memory because copy-on-write is always triggered in
read-write I/O loops were subsequent IO#read calls will
clobber the given write buffer.

buf = ''.b
while input.read(16384, buf)
  output.write(buf)
end

This generates a lot of garbage starting with Ruby 2.2 (r44471).
For my use case, even IO.copy_stream generates garbage, since
I wrap "write" to do Digest calculation in a single pass.

I tried using rb_str_replace and reusing the string as a hidden
(klass == 0) thread-local, but rb_str_replace attempts CoW
optimization by creating new frozen objects, too:

https://80x24.org/spew/20161229004417.12304-1-e@80x24.org/raw

So, I'm not sure what to do, temporal locking seems wrong for
writing strings (I guess it's for reading?). I get
test_threaded_flush failures with the following:

https://80x24.org/spew/20161229005701.9712-1-e@80x24.org/raw

IO#syswrite has the same problem with garbage. I can use
IO#write_nonblock on fast filesystems while holding GVL,
I guess...


Files


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #13299: backport r57469, r57472, r57508 (garbage reduction for IO#write/syswrite)ClosedActions

Updated by normalperson (Eric Wong) about 7 years ago

Proposed patch to temporarily freeze string while copying

    io.c (io_fwrite): temporarily freeze string when writing
    
    This avoids garbage from IO#write for [Bug #13085].
    Memory usage from benchmark/bm_io_copy_stream_write.rb
    is reduced greatly:
    
      target 0: a (ruby 2.5.0dev (2016-12-30 trunk 57236) [x86_64-linux])
      target 1: b (ruby 2.5.0dev (2016-12-30) [x86_64-linux])
    
      Memory usage (last size) (B)
      name  a       b
      io_copy_stream_write  82235392.000    6651904.000
    
      Memory consuming ratio (size) with the result of `a' (greater is better)
      name  b
      io_copy_stream_write  12.363
    
    There is also a speedup in execution time:
    
      Execution time (sec)
      name  a       b
      io_copy_stream_write  0.380   0.143
    
      Speedup ratio: compare with the result of `a' (greater is better)
      name  b
      io_copy_stream_write  2.651
    
    Caveat, there is one potential race condition:
    
    If another thread calls String#freeze on the string we are
    currently writing; we will blindly unfreeze it during
    fwrite_unfreeze from ensure.  However, I do not expect this to
    be a real-world case.
    
    Ideally, Ruby should have a way of detecting threads which
    are not visible to other threads.

Updated by normalperson (Eric Wong) about 7 years ago

wrote:

Issue #13085 has been updated by Eric Wong.

File 0001-io.c-io_fwrite-temporarily-freeze-string-when-writin.patch added

Caveat, there is one potential race condition:

If another thread calls String#freeze on the string we are
currently writing; we will blindly unfreeze it during
fwrite_unfreeze from ensure.  However, I do not expect this to
be a real-world case.

Ideally, Ruby should have a way of detecting threads which
are not visible to other threads.

Any comment? I still have not been able to think of a better
solution. I'll commit in a month or five if no objections.

https://bugs.ruby-lang.org/issues/13085#change-62335

Updated by nobu (Nobuyoshi Nakada) about 7 years ago

No.
Your patch releases the GVL while unfreezing the string to be written.
If the write operation is blocked, other threads can clobber that string.

Updated by normalperson (Eric Wong) about 7 years ago

wrote:

No.
Your patch releases the GVL while unfreezing the string to be written.
If the write operation is blocked, other threads can clobber that string.

It does?

fwrite_unfreeze happens in rb_ensure e_proc, which has GVL.

From what I can tell, GVL release happens only inside
fwrite_freeze (rb_ensure b_proc):

    fwrite_freeze (rb_ensure b_proc)
        io_binwrite
            rb_mutex_synchronize(fptr->write_lock...)
                rb_mutex_lock = NOGVL
                io_binwrite_string (identical as below:)
            io_binwrite_string
                rb_writev_internal
                    rb_thread_io_blocking_region = NOGVL
                rb_write_internal
                    rb_thread_io_blocking_region = NOGVL
            rb_io_wait_writable = NOGVL

By the time e_proc (fwrite_unfreeze) is called by rb_ensure
we have GVL, again.

Updated by nobu (Nobuyoshi Nakada) about 7 years ago

  • Description updated (diff)

Another thread may make a substring of the buffer string during writing.
Then the temporarily frozen string becomes a shared root of the new substring.
Later the root gets unfrozen and modified, now the substring has an invalid pointer.

If another thread calls String#freeze on the string we are
currently writing; we will blindly unfreeze it during
fwrite_unfreeze from ensure. However, I do not expect this to
be a real-world case.

That is test_threaded_flush in test_io.rb,
freezing the same string simultaneously in io_fwrite.

Updated by normalperson (Eric Wong) about 7 years ago

OK, different strategy; not as fast, but still better than what
we currently have.

[PATCH v2] io.c (io_fwrite): copy to hidden buffer when writing

This avoids garbage from IO#write for [Bug #13085] when
called in a read-write loop while protecting the VM
from race conditions forced by the user.

Memory usage from benchmark/bm_io_copy_stream_write.rb
is reduced greatly:

target 0: a (ruby 2.5.0dev (2017-01-05 trunk 57270) [x86_64-linux])
target 1: b (ruby 2.5.0dev (2017-01-05) [x86_64-linux])

Memory usage (last size) (B)
name a b
io_copy_stream_write 81899520.000 6561792.000

Memory consuming ratio (size) with the result of `a' (greater is better)
name b
io_copy_stream_write 12.481

Despite the extra deep data copy, there is a small speedup in
execution time due to GC avoidance:

Execution time (sec)
name a b
io_copy_stream_write 0.393 0.296

Speedup ratio: compare with the result of `a' (greater is better)
name b
io_copy_stream_write 1.328

This patch increases memory bandwidth use by pessimistically
copying the data into a temporary hidden buffer. Using a
lightweight frozen copy (as before this patch) is ineffective
in read-write loops, since the read operation will trigger
a heavy copy behind our back due to the CoW operation.

It is also impossible to safely release memory from the
lightweight CoW string, because we have no idea how many
lightweight duplicates exist by the time we reacquire GVL.

So, we now make a heavy copy up front which we recycle
immediately upon completion.

Ideally, Ruby should have a way of detecting Strings which are
not visible to other threads and be able to optimize away the
internal copy. Or, we give up on the idea of implicit data
sharing between threads since its dangerous anyways.

Updated by normalperson (Eric Wong) about 7 years ago

wrote:

File 0001-v2-io.c-io_fwrite-copy-to-hidden-buffer-when-writing.patch added

OK, different strategy; not as fast, but still better than what
we currently have.

Any comments on my alternative patch?

Anyways, I don't like this v2 strategy, either, since it has an
extra memory copy, but it's still better than creating gigantic
amounts of garbage.

With the existence of ObjectSpace.each_object and ability for
threads to be spawned inside signal handlers, I'm not sure if
there's a performant way for us to check the thread-visibility
of a particular String to elide the copy.

https://bugs.ruby-lang.org/issues/13085#change-62398

Updated by normalperson (Eric Wong) about 7 years ago

At least on Linux, this is probably the best performance we can get
with sockets. Unfortunately, this does not affect writes to pipes
(write_nonblock is OK for me, at least), and regular files
(where we really need to release GVL)

--------8<-------
Subject: [PATCH] basicsocket: Linux workaround for excess garbage on write

Linux allows us to use the MSG_DONTWAIT flag on sockets
to do non-blocking send(2) without side effects of modifying
the underlying file flags. This allows us to replace the
generic IO#write and avoid garbage creation on GVL release:
https://bugs.ruby-lang.org/issues/13085
We now only release the GVL when IO cannot proceed.

While IO#write_nonblock may be used for pipes and sockets
portably, that has the side effect of changing the file flag via
fcntl. MSG_DONTWAIT under Linux has no side effects on the
socket.

I HATE IMPLICITLY SHARED MEMORY.
I HATE IMPLICITLY SHARED MEMORY.
I HATE IMPLICITLY SHARED MEMORY.
I HATE IMPLICITLY SHARED MEMORY.
I HATE IMPLICITLY SHARED MEMORY.
I HATE IMPLICITLY SHARED MEMORY.
I HATE IMPLICITLY SHARED MEMORY.
I HATE IMPLICITLY SHARED MEMORY.
I HATE IMPLICITLY SHARED MEMORY.
I HATE IMPLICITLY SHARED MEMORY.

Updated by normalperson (Eric Wong) about 7 years ago

I think this can be a universal solution. Lightly tested and all tests pass,
but I have not checked coverage, yet.

I reuse one of the embed length bits for shared (noembed) string
to track when a string is shared multiple times. In the common
case, there is one share (the original source string), so we can
recycle immediately in ensure. If a second share is set, the
new bit is set and we do not recycle in ensure.

Updated by nobu (Nobuyoshi Nakada) about 7 years ago

  • Status changed from Open to Assigned
  • Assignee set to normalperson (Eric Wong)

Seems nice, let's try it.

Updated by normalperson (Eric Wong) about 7 years ago

wrote:

Seems nice, let's try it.

Thanks, r57469. I'll work on syswrite and send* (socket) next.

Actions #12

Updated by Anonymous about 7 years ago

  • Status changed from Assigned to Closed

Applied in changeset r57471.


string.c (rb_str_tmp_frozen_release): release embedded strings

Handle the embedded case first, since we may have an embedded
duplicate and non-embedded original string.

  • string.c (rb_str_tmp_frozen_release): handled embedded strings
  • test/ruby/test_io.rb (test_write_no_garbage): new test
    [ruby-core:78898] [Bug #13085]

Updated by normalperson (Eric Wong) about 7 years ago

Eric Wong wrote:

wrote:

Seems nice, let's try it.

Thanks, r57469. I'll work on syswrite and send* (socket) next.

I guess send* in socket never froze the string. I guess it
will be necessary...

Anyways, I made sprintf and strftime avoid garbage
in a similar cases (r57473 and r57476), too.

Also, I've been thinking the rb_ensure usage for this in io.c
isn't necessary. Exceptions ought to be rare and I already
consider anything which repeatedly throws exceptions a lost
cause. So I think I'll apply the following, tomorrow:

https://80x24.org/spew/20170131024140.27859-1-e@80x24.org/raw

The diffstat alone is seems worth it:

io.c | 93 +++++++++++++++-----------------------------------------------------
1 file changed, 20 insertions(+), 73 deletions(-)

Actions #14

Updated by naruse (Yui NARUSE) about 7 years ago

  • Related to Bug #13299: backport r57469, r57472, r57508 (garbage reduction for IO#write/syswrite) added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0