Project

General

Profile

Actions

Feature #13996

closed

[PATCH] file.c: apply2files releases GVL

Added by normalperson (Eric Wong) over 6 years ago. Updated over 6 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:83200]

Description

This means File.chmod, File.lchmod, File.chown, File.lchown,
File.unlink, and File.utime operations on slow filesystems
no longer hold up other threads.

The platform-specific utime_failed changes is compile-tested
using a new UTIME_EINVAL macro

  • file.c (UTIME_EINVAL): new macro to ease compile-testing
  • file.c (struct apply_arg): new struct
  • file.c (no_gvl_apply2files): new function
  • file.c (apply2files): release GVL
  • file.c (chmod_internal): adjust for apply2files changes
  • file.c (lchmod_internal): ditto
  • file.c (chown_internal): ditto
  • file.c (lchown_internal): ditto
  • file.c (utime_failed): ditto
  • file.c (utime_internal): ditto
  • file.c (unlink_internal): ditto

Files

0001-file.c-apply2files-releases-GVL.patch (7.65 KB) 0001-file.c-apply2files-releases-GVL.patch normalperson (Eric Wong), 10/10/2017 07:15 PM

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

https://bugs.ruby-lang.org/issues/13996

Updated patch with benchmark, now:

https://80x24.org/spew/20171012191218.1624-1-e@80x24.org/raw

This hurts performance on fast filesystem, but these methods
are unlikely to be performance bottlenecks and (IMHO) avoiding
pathological slowdowns and stalls are more important.

benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name trunk built
file_chmod 0.591 0.801

Speedup ratio: compare with the result of `trunk' (greater is better)
name built
file_chmod 0.737

:<

Updated by normalperson (Eric Wong) over 6 years ago

I wrote:

https://bugs.ruby-lang.org/issues/13996

Updated patch with benchmark, now:

https://80x24.org/spew/20171012191218.1624-1-e@80x24.org/raw

This hurts performance on fast filesystem, but these methods
are unlikely to be performance bottlenecks and (IMHO) avoiding
pathological slowdowns and stalls are more important.

Will commit in a few days...

At least on Linux, GVL performance may be slightly improved:

https://bugs.ruby-lang.org/issues/13697

Updated by normalperson (Eric Wong) over 6 years ago

In the same vein as https://bugs.ruby-lang.org/issues/13996

There's 4 more patches to dir.c around opendir, readdir, openat,
and chdir.

I don't think releasing GVL around rewinddir is necessary, since
AFAIK; it's just an lseek and that only updates an offset in the
kernel.

Releasing GVL around closedir may not be needed, either, since
it's operations on that FD are read-only; I'll have to read
Linux kernel sources around various FSes to confirm; but also
modern versions of Linux defer a lot of the complicated close()
stuff, anyways, to reduce userspace syscall time.

However, releasing GVL around mkdir and rmdir will be necessary,
of course; as well as the remaining stat/lstat calls in dir.c

It also looks like the dir glob code could be doing multiple
syscalls and non-rb_* userspace stuff without GVL to reduce
overhead of releasing/acquiring the GVL too frequently; but
that's a more involved patch and UTF-8 normalizing FS platforms
won't benefit (but they're not my top priority performance-wise,
either).

The following changes since commit 593d9786464c94db9c776a5a968dcf65cbc1d9d4:

tempfile.rb: [DOC] all arguments [ci skip] (2017-10-17 12:40:00 +0000)

are available in the git repository at:

git://80x24.org/ruby dir-file-gvl-r60203

for you to fetch changes up to 859716bb3a0bc2057a2d0f05079e480a1955e36c:

dir: Dir.chdir releases GVL (2017-10-18 01:59:48 +0000)


Eric Wong (5):
file.c: apply2files releases GVL
dir.c: release GVL around remaining readdir calls
dir.c: release GVL on opendir
dir.c: openat calls release GVL, too
dir: Dir.chdir releases GVL

benchmark/bm_file_chmod.rb | 9 +++
dir.c | 195 ++++++++++++++++++++++++++++++++-------------
file.c | 159 +++++++++++++++++++++++++-----------
3 files changed, 261 insertions(+), 102 deletions(-)
create mode 100644 benchmark/bm_file_chmod.rb

Actions #4

Updated by Anonymous over 6 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r60386.


file.c: apply2files releases GVL

This means File.chmod, File.lchmod, File.chown, File.lchown,
File.unlink, and File.utime operations on slow filesystems
no longer hold up other threads.

The platform-specific utime_failed changes is compile-tested
using a new UTIME_EINVAL macro

This hurts performance on fast filesystem, but these methods
are unlikely to be performance bottlenecks and (IMHO) avoiding
pathological slowdowns and stalls are more important.

benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name trunk built
file_chmod 0.591 0.801

Speedup ratio: compare with the result of `trunk' (greater is better)
name built
file_chmod 0.737

  • file.c (UTIME_EINVAL): new macro to ease compile-testing
  • file.c (struct apply_arg): new struct
  • file.c (no_gvl_apply2files): new function
  • file.c (apply2files): release GVL
  • file.c (chmod_internal): adjust for apply2files changes
  • file.c (lchmod_internal): ditto
  • file.c (chown_internal): ditto
  • file.c (lchown_internal): ditto
  • file.c (utime_failed): ditto
  • file.c (utime_internal): ditto
  • file.c (unlink_internal): ditto
    [ruby-core:83200] [Feature #13996]

Updated by nobu (Nobuyoshi Nakada) over 6 years ago

    /*
     * aa is on-stack for small argc, we must ensure paths are marked
     * for large argv if aa is on the heap.
     */

As ALLOCV uses imemo_alloc (NODE_ALLOCA till 2.4), the contents are marked.

    size_t size = sizeof(const char *) + sizeof(VALUE);
    const long len = (long)(sizeof(struct apply_arg) + (size * argc) - size);

I'm afraid that struct apply_arg and fn in it may not be guaranteed as never-padded.
Isn't it better to make it a separate struct, and calculate len using offsetof?

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

As ALLOCV uses imemo_alloc (NODE_ALLOCA till 2.4), the contents are marked.

Ah, thanks. Good to know.

    size_t size = sizeof(const char *) + sizeof(VALUE);
    const long len = (long)(sizeof(struct apply_arg) + (size * argc) - size);

I'm afraid that struct apply_arg and fn in it may not be guaranteed as never-padded.
Isn't it better to make it a separate struct, and calculate len using offsetof?

Right, both should be addressed in r60408. I'm not sure which
platform the padding can affect. Maybe it cause [Bug #14049]?

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0