Bug #21715: Miscompilation on x86-64-v2 due to undefined behavior in search_nonascii in string.c - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #21715

closed

Miscompilation on x86-64-v2 due to undefined behavior in search_nonascii in string.c

Bug #21715: Miscompilation on x86-64-v2 due to undefined behavior in search_nonascii in string.c

Added by mjacob (Manuel Jacob) 27 days ago. Updated 7 days ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

Backport:

3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN

[ruby-core:123920]

Description

Building the following Dockerfile fails on a x86-64 machine in the last step (running make command):

FROM opensuse/leap:16.0
RUN zypper --non-interactive install wget make gcc
RUN wget 'https://cache.ruby-lang.org/pub/ruby/3.4/ruby-3.4.7.tar.gz'
RUN tar xaf ruby-3.4.7.tar.gz
WORKDIR ruby-3.4.7/build
RUN ../configure
RUN make

The failing command (during make) is: ./miniruby -I../lib -I. -I.ext/common ../tool/mkconfig.rb -arch=x86_64-linux -version=3.4.7 -install_name=ruby -so_name=ruby -unicode_version=15.0.0 -unicode_emoji_version=15.0 > rbconfig.tmp

Excerpt from the crash report:

../tool/mkconfig.rb: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0001 p:0000 s:0003 E:000ec0 DUMMY  [FINISH]


-- Threading information ---------------------------------------------------
Total ractor count: 1
Ruby thread count for this ractor: 1

-- Machine register context ------------------------------------------------
 RIP: 0x0000556c2da74760 RBP: 0x0000000000000027 RSP: 0x00007ffd24a195f0
 RAX: 0x0000000000000028 RBX: 0x0000556c64acc420 RCX: 0x0000000000000000
 RDX: 0x0000000000000000 RDI: 0x0000000000000014 RSI: 0x00007f49f7d6c123
  R8: 0x46ea57707c6b1df2  R9: 0x00007f49f7d6c123 R10: 0x2afb945fcb545f01
 R11: 0x0000556c2dc3fe50 R12: 0x00007f49f7d6c263 R13: 0x00007f49f7d6c11b
 R14: 0x0000556c64bdaa48 R15: 0x00007f49f7d6c25c EFL: 0x0000000000010256

-- C level backtrace information -------------------------------------------
/ruby-3.4.7/build/miniruby(rb_print_backtrace+0x5) [0x556c2db2c1b6] ../vm_dump.c:823
/ruby-3.4.7/build/miniruby(rb_vm_bugreport) ../vm_dump.c:1155
/ruby-3.4.7/build/miniruby(rb_bug_for_fatal_signal+0xf7) [0x556c2d8cdc47] ../error.c:1130
/ruby-3.4.7/build/miniruby(sigsegv+0x42) [0x556c2da58482] ../signal.c:934
/lib64/libc.so.6(__restore_rt+0x0) [0x7f49f7eb2090]
/ruby-3.4.7/build/miniruby(search_nonascii+0xcb) [0x556c2da74760] ../string.c:729
/ruby-3.4.7/build/miniruby(coderange_scan) ../string.c:767
/ruby-3.4.7/build/miniruby(rbimpl_fl_unset_raw_raw+0x0) [0x556c2da76874] ../string.c:895
/ruby-3.4.7/build/miniruby(RB_FL_UNSET_RAW) ../include/ruby/internal/fl_type.h:669
/ruby-3.4.7/build/miniruby(RB_ENC_CODERANGE_SET) ../include/ruby/internal/encoding/coderange.h:131
/ruby-3.4.7/build/miniruby(enc_coderange_scan) ../string.c:911
/ruby-3.4.7/build/miniruby(rb_enc_str_coderange) ../string.c:910
/ruby-3.4.7/build/miniruby(is_ascii_string+0x8) [0x556c2da7697e] ../internal/string.h:151
/ruby-3.4.7/build/miniruby(str_do_hash) ../string.c:393
/ruby-3.4.7/build/miniruby(register_fstring) ../string.c:554
/ruby-3.4.7/build/miniruby(rb_enc_literal_str+0x87) [0x556c2da94bb7] ../string.c:12546
/ruby-3.4.7/build/miniruby(parse_static_literal_string+0x38) [0x556c2d875991] ../prism_compile.c:312
/ruby-3.4.7/build/miniruby(pm_compile_node) ../prism_compile.c:10321
/ruby-3.4.7/build/miniruby(pm_compile_node+0x2e65) [0x556c2d875aa5] ../prism_compile.c:10309
/ruby-3.4.7/build/miniruby(pm_compile_conditional+0x18c) [0x556c2d88cfcc] ../prism_compile.c:1053
/ruby-3.4.7/build/miniruby(pm_compile_node+0x42e1) [0x556c2d876f21] ../prism_compile.c:9355
/ruby-3.4.7/build/miniruby(pm_setup_args_core+0xe4) [0x556c2d884304] ../prism_compile.c:1792
/ruby-3.4.7/build/miniruby(pm_setup_args+0x98) [0x556c2d884e98] ../prism_compile.c:1979
/ruby-3.4.7/build/miniruby(pm_compile_call+0x307) [0x556c2d885cf7] ../prism_compile.c:3673
/ruby-3.4.7/build/miniruby(pm_compile_call_node+0x2c6) [0x556c2d872326] ../prism_compile.c:7403
/ruby-3.4.7/build/miniruby(pm_compile_node+0x39dc) [0x556c2d87661c] ../prism_compile.c:8775
/ruby-3.4.7/build/miniruby(pm_compile_node+0x2e65) [0x556c2d875aa5] ../prism_compile.c:10309
/ruby-3.4.7/build/miniruby(pm_compile_conditional+0x18c) [0x556c2d88cfcc] ../prism_compile.c:1053-march=x86-64-v2
/ruby-3.4.7/build/miniruby(pm_compile_node+0x42e1) [0x556c2d876f21] ../prism_compile.c:9355
/ruby-3.4.7/build/miniruby(pm_compile_node+0x2e3a) [0x556c2d875a7a] ../prism_compile.c:10307
/ruby-3.4.7/build/miniruby(pm_compile_scope_node+0x104a) [0x556c2d88f5da] ../prism_compile.c:6991
/ruby-3.4.7/build/miniruby(pm_compile_node+0x35c9) [0x556c2d876209] ../prism_compile.c:10180
/ruby-3.4.7/build/miniruby(APPEND_LIST+0x0) [0x556c2d891e60] ../prism_compile.c:10481
/ruby-3.4.7/build/miniruby(pm_iseq_compile_node) ../prism_compile.c:10485
/ruby-3.4.7/build/miniruby(pm_iseq_new_with_opt_try+0x10) [0x556c2d94c790] ../iseq.c:1042
/ruby-3.4.7/build/miniruby(rb_protect+0xd6) [0x556c2d8db9c6] ../eval.c:1054
/ruby-3.4.7/build/miniruby(pm_iseq_new_with_opt+0x177) [0x556c2d9525c7] ../iseq.c:1095
/ruby-3.4.7/build/miniruby(pm_iseq_new_main+0x85) [0x556c2d952895] ../iseq.c:943
/ruby-3.4.7/build/miniruby(process_options+0x12fd) [0x556c2da519cd] ../ruby.c:2616
/ruby-3.4.7/build/miniruby(ruby_process_options+0x157) [0x556c2da52657] ../ruby.c:3174
/ruby-3.4.7/build/miniruby(ruby_options+0x97) [0x556c2d8da977] ../eval.c:117
/ruby-3.4.7/build/miniruby(rb_main+0x19) [0x556c2d7eb578] ../prism/prism.c:21769
/ruby-3.4.7/build/miniruby(main) ../main.c:68
/lib64/libc.so.6(__libc_start_call_main+0x82) [0x7f49f7e9b340]
/lib64/libc.so.6(__libc_start_main+0x8b) [0x7f49f7e9b409]
/ruby-3.4.7/build/miniruby(_start+0x25) [0x556c2d7eb5c5] ../main.c:69

The failing instruction at 0x556c2da74760 is: movdqa xmm0, XMMWORD PTR [rsi+rcx*1]. At this place, register rsi contains 0x7f49f7d6c123, which is the value 0x7f49f7d6c11b of parameter p of the function search_nonascii + 8, and register rcx contains 0. So, the whole instruction means “move aligned packed integer values from memory at 0x7f49f7d6c123 to register xmm0”. The segmentation fault happened because the address is expected to be aligned on a 16-byte boundary, but it is not.

The instruction is part of a loop at https://github.com/ruby/ruby/blob/v3_4_7/string.c#L728 that gets auto-vectorized by GCC. On x86-64,

UNALIGNED_WORD_ACCESS is 1
p doesn’t get aligned to anything because of #if !UNALIGNED_WORD_ACCESS in line 700
aligned_ptr(value) is expanded to (uintptr_t *)(value) according to line 723
p is therefore casted to type uintptr_t * in line 725
uintptr_t is typedefed to unsigned long int, which has alignment of 8 bytes

In result, a pointer p to potentially unaligned memory is casted to a pointer to a type with alignment of 8 bytes. That is undefined behavior according to C99 6.3.2.3p7: “A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.”. Compilers can utilize this rule to make the assumption that the pointed-to memory has alignment of 8 bytes. In this case, the GCC loop auto-vectorizer adds code to align the assumedly 8 bytes aligned address to 16 bytes alignment. A subsequent instruction assuming 16 bytes alignment can therefore fail.

I could reproduce this crash only on openSUSE Leap 16.0, but not openSUSE Leap 15.6, openSUSE Tumbleweed or Arch Linux, because only the former configured GCC to default to emitting code requiring x86-64-v2. When passing -march=x86-64-v2 in CFLAGS, the crash happens on all these distributions.

Updated by alanwu (Alan Wu) 27 days ago Actions
Copy link
#1 [ruby-core:123921]

Right, it's doing the unaligned read in the classic intuitive-but-UB way. Can you try the following (roughly tested) patch? It's based on the ruby_3_4 branch.

From 225f6caf914a4dd4c457d9e52ab72a79c91bd1a7 Mon Sep 17 00:00:00 2001
From: Alan Wu <XrXr@users.noreply.github.com>
Date: Wed, 26 Nov 2025 21:59:37 -0500
Subject: [PATCH] string.c: Fix UB unaligned read by replacing with memcpy

---
 string.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/string.c b/string.c
index af8f493285..663d2d01c7 100644
--- a/string.c
+++ b/string.c
@@ -676,7 +676,7 @@ VALUE rb_fs;
 static inline const char *
 search_nonascii(const char *p, const char *e)
 {
-    const uintptr_t *s, *t;
+    const char *s, *t;
 
 #if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
 # if SIZEOF_UINTPTR_T == 8
@@ -720,17 +720,19 @@ search_nonascii(const char *p, const char *e)
 #define aligned_ptr(value) \
         __builtin_assume_aligned((value), sizeof(uintptr_t))
 #else
-#define aligned_ptr(value) (uintptr_t *)(value)
+#define aligned_ptr(value) (value)
 #endif
         s = aligned_ptr(p);
-        t = (uintptr_t *)(e - (SIZEOF_VOIDP-1));
+        t = (e - (SIZEOF_VOIDP-1));
 #undef aligned_ptr
-        for (;s < t; s++) {
-            if (*s & NONASCII_MASK) {
+        for (;s < t; s += sizeof(uintptr_t)) {
+            uintptr_t word;
+            memcpy(&word, s, sizeof(word));
+            if (word & NONASCII_MASK) {
 #ifdef WORDS_BIGENDIAN
-                return (const char *)s + (nlz_intptr(*s&NONASCII_MASK)>>3);
+                return (const char *)s + (nlz_intptr(word&NONASCII_MASK)>>3);
 #else
-                return (const char *)s + (ntz_intptr(*s&NONASCII_MASK)>>3);
+                return (const char *)s + (ntz_intptr(word&NONASCII_MASK)>>3);
 #endif
             }
         }
-- 
2.50.1

Updated by mame (Yusuke Endoh) 27 days ago Actions
Copy link
#2 [ruby-core:123922]

I wonder if the premise that "unaligned word access is feasible on x86" no longer holds in modern contexts?

We are of course aware that unaligned word access is undefined behavior in C. However, it is slightly faster, which is why we introduced this optimization specifically for x86.

I evaluated the performance on an AMD Ryzen 9 6900HX with gcc version 15.2.0 (Ubuntu 15.2.0-4ubuntu4) using the benchmark below. (I ran each test 10 times and picked the best result.)

s = ([65] * 10).pack("C*")
t = Process.clock_gettime(Process::CLOCK_MONOTONIC)
20000000.times { s.dup.force_encoding("UTF-8").scrub }
p Process.clock_gettime(Process::CLOCK_MONOTONIC) - t

It appears that -march=x86-64 -DUNALIGNED_WORD_ACCESS=1 remains the fastest.

cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=1": 2.918 s.
cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=1" with Alan's patch: 2.941 s.
cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=0": 3.020 s.
cflags="-march=x86-64-v2 -DUNALIGNED_WORD_ACCESS=0": 3.175 s.
cflags="-march=x86-64-v3 -DUNALIGNED_WORD_ACCESS=0": 3.017 s.
cflags="-march=x86-64-v4 -DUNALIGNED_WORD_ACCESS=0": Illegal instruction

It is worth noting that x86-64-v3 performs extremely well for long strings. On the other hand, x86-64-v2 is clearly slower than x86-64, which is unfortunate.

s = ([65] * 1000000).pack("C*")
t = Process.clock_gettime(Process::CLOCK_MONOTONIC)
200000.times { s.dup.force_encoding("UTF-8").scrub }
p Process.clock_gettime(Process::CLOCK_MONOTONIC) - t

cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=1": 5.229 s.
cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=1" with Alan's patch: 5.232 s.
cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=0": 5.230 s.
cflags="-march=x86-64-v2 -DUNALIGNED_WORD_ACCESS=0": 6.127 s.
cflags="-march=x86-64-v3 -DUNALIGNED_WORD_ACCESS=0": 2.728 s.
cflags="-march=x86-64-v4 -DUNALIGNED_WORD_ACCESS=0": Illegal instruction

However, since most strings handled in Ruby are not that long, it is likely more critical to ensure speed for short strings.

Regarding Alan's patch, it only supports search_nonascii. Since the optimization under UNALIGNED_WORD_ACCESS is applied in other places as well, the patch may be incomplete.

Looking at these benchmarks, it seems fair to say the difference is not drastic. If the performance degradation is only around 3.3%, I think it is fine to abandon the optimization and set UNALIGNED_WORD_ACCESS=0 unconditionally. I would appreciate it if others could verify this on different environments as well.

Updated by alanwu (Alan Wu) 26 days ago · Edited Actions
Copy link
#3 [ruby-core:123927]

I repeated Mame's experiment on a Xeon Platinum 8124M and gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04). The chip is from 2017, and runs x86-64-v4. I'm using slightly different scripts since I'm running with frequency scaling disabled. Also, I'm using hyperfine to get some basic stats on the results.

# short-str.rb
s = ([65] * 10).pack("C*")
4000000.times { s.dup.force_encoding("UTF-8").scrub }

$ hyperfine -L ruby x86-64-uwa-1,x86-64-uwa-1-sans-ub,x86-64-uwa-0,x86-64-v2-uwa-0,x86-64-v3-uwa-0,x86-64-v4-uwa-0 '~/.rubies/{ruby}/bin/ruby --disable-all short-str.rb'

Benchmark 1: ~/.rubies/x86-64-uwa-1/bin/ruby --disable-all short-str.rb

Time (mean ± σ):      1.165 s ±  0.001 s    [User: 1.157 s, System: 0.007 s]

Range (min … max):    1.164 s …  1.166 s    10 runs
Benchmark 2: ~/.rubies/x86-64-uwa-1-sans-ub/bin/ruby --disable-all short-str.rb

Time (mean ± σ):      1.179 s ±  0.001 s    [User: 1.172 s, System: 0.007 s]

Range (min … max):    1.177 s …  1.181 s    10 runs
Benchmark 3: ~/.rubies/x86-64-uwa-0/bin/ruby --disable-all short-str.rb

Time (mean ± σ):      1.142 s ±  0.001 s    [User: 1.135 s, System: 0.007 s]

Range (min … max):    1.141 s …  1.144 s    10 runs
Benchmark 4: ~/.rubies/x86-64-v2-uwa-0/bin/ruby --disable-all short-str.rb

Time (mean ± σ):      1.165 s ±  0.001 s    [User: 1.157 s, System: 0.007 s]

Range (min … max):    1.162 s …  1.167 s    10 runs
Benchmark 5: ~/.rubies/x86-64-v3-uwa-0/bin/ruby --disable-all short-str.rb

Time (mean ± σ):      1.150 s ±  0.001 s    [User: 1.140 s, System: 0.009 s]

Range (min … max):    1.148 s …  1.153 s    10 runs
Benchmark 6: ~/.rubies/x86-64-v4-uwa-0/bin/ruby --disable-all short-str.rb

Time (mean ± σ):      1.181 s ±  0.001 s    [User: 1.172 s, System: 0.008 s]

Range (min … max):    1.179 s …  1.184 s    10 runs

Summary
  ~/.rubies/x86-64-uwa-0/bin/ruby --disable-all short-str.rb ran
    1.01 ± 0.00 times faster than ~/.rubies/x86-64-v3-uwa-0/bin/ruby --disable-all short-str.rb
    1.02 ± 0.00 times faster than ~/.rubies/x86-64-v2-uwa-0/bin/ruby --disable-all short-str.rb
    1.02 ± 0.00 times faster than ~/.rubies/x86-64-uwa-1/bin/ruby --disable-all short-str.rb
    1.03 ± 0.00 times faster than ~/.rubies/x86-64-uwa-1-sans-ub/bin/ruby --disable-all short-str.rb
    1.03 ± 0.00 times faster than ~/.rubies/x86-64-v4-uwa-0/bin/ruby --disable-all short-str.rb

I'm seeing the same 3% difference, but cflags="-march=x86-64 -DUNALIGNED_WORD_ACCESS=0" wins. Side note, it's pretty tricky to measure the speed on short inputs. The loop overhead seems too large compared to the string operations.

# long-str.rb
s = ([65] * 100000).pack("C*")
200000.times { s.dup.force_encoding("UTF-8").scrub }

$ hyperfine -L ruby x86-64-uwa-1,x86-64-uwa-1-sans-ub,x86-64-uwa-0,x86-64-v2-uwa-0,x86-64-v3-uwa-0,x86-64-v4-uwa-0 '~/.rubies/{ruby}/bin/ruby --disable-all long-str.rb' --warmup 3

Benchmark 1: ~/.rubies/x86-64-uwa-1/bin/ruby --disable-all long-str.rb

Time (mean ± σ):      1.531 s ±  0.002 s    [User: 1.527 s, System: 0.004 s]

Range (min … max):    1.529 s …  1.534 s    10 runs
Benchmark 2: ~/.rubies/x86-64-uwa-1-sans-ub/bin/ruby --disable-all long-str.rb

Time (mean ± σ):     830.5 ms ±   1.0 ms    [User: 826.5 ms, System: 3.7 ms]

Range (min … max):   829.1 ms … 831.9 ms    10 runs
Benchmark 3: ~/.rubies/x86-64-uwa-0/bin/ruby --disable-all long-str.rb

Time (mean ± σ):     831.3 ms ±   2.1 ms    [User: 827.4 ms, System: 3.6 ms]

Range (min … max):   828.9 ms … 834.8 ms    10 runs
Benchmark 4: ~/.rubies/x86-64-v2-uwa-0/bin/ruby --disable-all long-str.rb

Time (mean ± σ):      2.248 s ±  0.002 s    [User: 2.244 s, System: 0.003 s]

Range (min … max):    2.246 s …  2.253 s    10 runs
Benchmark 5: ~/.rubies/x86-64-v3-uwa-0/bin/ruby --disable-all long-str.rb

Time (mean ± σ):     830.1 ms ±   1.7 ms    [User: 827.2 ms, System: 2.6 ms]

Range (min … max):   827.6 ms … 832.9 ms    10 runs
Benchmark 6: ~/.rubies/x86-64-v4-uwa-0/bin/ruby --disable-all long-str.rb

Time (mean ± σ):      2.254 s ±  0.004 s    [User: 2.249 s, System: 0.004 s]

Range (min … max):    2.249 s …  2.259 s    10 runs

Summary
  ~/.rubies/x86-64-v3-uwa-0/bin/ruby --disable-all long-str.rb ran
    1.00 ± 0.00 times faster than ~/.rubies/x86-64-uwa-1-sans-ub/bin/ruby --disable-all long-str.rb
    1.00 ± 0.00 times faster than ~/.rubies/x86-64-uwa-0/bin/ruby --disable-all long-str.rb
    1.84 ± 0.00 times faster than ~/.rubies/x86-64-uwa-1/bin/ruby --disable-all long-str.rb
    2.71 ± 0.01 times faster than ~/.rubies/x86-64-v2-uwa-0/bin/ruby --disable-all long-str.rb
    2.71 ± 0.01 times faster than ~/.rubies/x86-64-v4-uwa-0/bin/ruby --disable-all long-str.rb

x86-64-v3 wins.

Regarding Alan's patch, it only supports search_nonascii. Since the optimization under UNALIGNED_WORD_ACCESS is applied in other places as well, the patch may be incomplete.

Right, it's incomplete. I just wanted to offer something quickly to see if it fixes the particular crash in OP.

I think it is fine to abandon the optimization and set UNALIGNED_WORD_ACCESS=0 unconditionally.

I agree. If we do that, I hope we can delete the code for UNALIGNED_WORD_ACCESS=1. I think it's a mistake to keep around code that intentionally trigger UB, especially after learning that they cause crashes.

Further simplification is possible after removing dead code by doing unaligned reads using memcpy unconditionally, on all platforms. It gets rid of the code for manually align pointers. It's a good balance between speed, C compliance, and complexity. This is optional, though, since we simplify by a lot by just keeping one side of UNALIGNED_WORD_ACCESS.

UNALIGNED_WORD_ACCESS=1 is kind of funny. Once vectorized, most of the loads in the loop are in fact, aligned reads such as MOVDQA.

Updated by naruse (Yui NARUSE) 13 days ago Actions
Copy link
#4 [ruby-core:124140]

For long term, believing the intelligence of the compiler sounds reasonable.

But if we believe the compiler, it doesn't need hand written parallelism. The code can be simplified like

const char *
search_nonascii2(const char *p, const char *e)
{
    for (;p < e; p++) {
        if (*p & 0x80) {
            return p;
        }
    }
    return 0;
}

https://godbolt.org/z/xrKPsYhYc

And for short term fix for -v2, we can disable vectorization for the function with GCC pragma.

Therefore the discussion will be how to switch these 2 strategies.
I think first we use simple one with vectorization for v3 (with AVX), otherwise it uses the current implementation like

#if defined(__AVX__)
const char *
search_nonascii(const char *p, const char *e)
{
    for (;p < e; p++) {
        if (*p & 0x80) {
            return p;
        }
    }
    return 0;
}
#else
# pragma GCC push_options
# pragma GCC optimize ("no-tree-vectorize")
const char *
search_nonascii(const char *p, const char *e)
{
    for (;p < e; p++) {
        if (*p & 0x80) {
            return p;
        }
    }
    return 0;
}
# pragma GCC pop_options
#endif

Updated by alanwu (Alan Wu) 11 days ago Actions
Copy link
#5 [ruby-core:124175]

I'm not a big fan the pragma route. Trying to get good codegen out of UB triggering C code is inherently a whack-a-mole game with compiler brands and even options of the same brand. With non compliant C code the incantation to patch over miscompilation is different between GCC/Clang/MSVC or what have you.

Disabling vectorization also leaves speed on the table. It vectorizes the memcpy version just fine. If __AVX__ is required for vectorization the default, which is equivalent to -march=x86-64 is penalized, too.

I'd rather replace the UB casts with memcpy everywhere.

Updated by naruse (Yui NARUSE) 9 days ago Actions
Copy link
#6 [ruby-core:124200]

I investigated further and I understand Alan's patch is the best at this time.
auto vectorization for this kind of loop is implemented only by GCC 15, it's too early.

@Alan, could you commit it?

Updated by alanwu (Alan Wu) 7 days ago Actions
Copy link
#7

Status changed from Open to Closed

Applied in changeset git|d209e6f1c0a93ad3ce1cc64dd165a6b67672614d.

search_nonascii(): Replace UB pointer cast with memcpy

Casting a pointer to create an unaligned one is undefined behavior in C
standards. Use memcpy to express the unaligned load instead to play by
the rules.

Practically, this yields the same binary output in many situations
while fixing the crash in [Bug #21715].

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #21715

Miscompilation on x86-64-v2 due to undefined behavior in search_nonascii in string.c

Updated by alanwu (Alan Wu) 27 days ago Actions
Copy link
#1 [ruby-core:123921]

Updated by mame (Yusuke Endoh) 27 days ago Actions
Copy link
#2 [ruby-core:123922]

Updated by alanwu (Alan Wu) 26 days ago · Edited Actions
Copy link
#3 [ruby-core:123927]

Updated by naruse (Yui NARUSE) 13 days ago Actions
Copy link
#4 [ruby-core:124140]

Updated by alanwu (Alan Wu) 11 days ago Actions
Copy link
#5 [ruby-core:124175]

Updated by naruse (Yui NARUSE) 9 days ago Actions
Copy link
#6 [ruby-core:124200]

Updated by alanwu (Alan Wu) 7 days ago Actions
Copy link
#7

Project

General

Profile

Ruby

Tags

Custom queries

Bug #21715

Miscompilation on x86-64-v2 due to undefined behavior in search_nonascii in string.c

Updated by alanwu (Alan Wu) 27 days ago ActionsCopy link #1 [ruby-core:123921]

Updated by mame (Yusuke Endoh) 27 days ago ActionsCopy link #2 [ruby-core:123922]

Updated by alanwu (Alan Wu) 26 days ago · Edited ActionsCopy link #3 [ruby-core:123927]

Updated by naruse (Yui NARUSE) 13 days ago ActionsCopy link #4 [ruby-core:124140]

Updated by alanwu (Alan Wu) 11 days ago ActionsCopy link #5 [ruby-core:124175]

Updated by naruse (Yui NARUSE) 9 days ago ActionsCopy link #6 [ruby-core:124200]

Updated by alanwu (Alan Wu) 7 days ago ActionsCopy link #7

Updated by alanwu (Alan Wu) 27 days ago Actions
Copy link
#1 [ruby-core:123921]

Updated by mame (Yusuke Endoh) 27 days ago Actions
Copy link
#2 [ruby-core:123922]

Updated by alanwu (Alan Wu) 26 days ago · Edited Actions
Copy link
#3 [ruby-core:123927]

Updated by naruse (Yui NARUSE) 13 days ago Actions
Copy link
#4 [ruby-core:124140]

Updated by alanwu (Alan Wu) 11 days ago Actions
Copy link
#5 [ruby-core:124175]

Updated by naruse (Yui NARUSE) 9 days ago Actions
Copy link
#6 [ruby-core:124200]

Updated by alanwu (Alan Wu) 7 days ago Actions
Copy link
#7