



Feature #15166


2.5 times faster implementation than current gcd implmentation

Added by jzakiya (Jabari Zakiya) almost 6 years ago. Updated about 5 years ago.

Target version:


This is to be more explicit (and accurate) than

This is my modified gcd benchmarks code, originally presented by Daniel Lemire (see 15161).

Ruby's current implementation of Stein's gcd algorithm is only slightly faster than the
code posted on the wikepedia page, and over 2.5 times slower than the fastest implementation
in the benchmarks.

[jzakiya@localhost ~]$ ./gcdbenchmarks
gcd between numbers in [1 and 2000]

gcdwikipedia7fast32   :  time = 99
gcdwikipedia4fast     :  time = 121
gcdFranke             :  time = 126
gcdwikipedia3fast     :  time = 134
gcdwikipedia2fastswap :  time = 136
gcdwikipedia5fast     :  time = 139
gcdwikipedia7fast     :  time = 138
gcdwikipedia2fast     :  time = 136
gcdwikipedia6fastxchg :  time = 144
gcdwikipedia2fastxchg :  time = 156
gcd_iterative_mod     :  time = 210
gcd_recursive         :  time = 215
basicgcd              :  time = 211
rubygcd               :  time = 267
gcdwikipedia2         :  time = 321
gcd between numbers in [1000000001 and 1000002000]

gcdwikipedia7fast32   :  time = 100
gcdwikipedia4fast     :  time = 121
gcdFranke             :  time = 126
gcdwikipedia3fast     :  time = 134
gcdwikipedia2fastswap :  time = 136
gcdwikipedia5fast     :  time = 138
gcdwikipedia7fast     :  time = 138
gcdwikipedia2fast     :  time = 136
gcdwikipedia6fastxchg :  time = 144
gcdwikipedia2fastxchg :  time = 156
gcd_iterative_mod     :  time = 210
gcd_recursive         :  time = 215
basicgcd              :  time = 211
rubygcd               :  time = 269
gcdwikipedia2         :  time = 323

This is Ruby's code per:
which is basically the wikepedia implementation.

inline static long
i_gcd(long x, long y)
    unsigned long u, v, t;
    int shift;

    if (x < 0)
	x = -x;
    if (y < 0)
	y = -y;

    if (x == 0)
	return y;
    if (y == 0)
	return x;

    u = (unsigned long)x;
    v = (unsigned long)y;
    for (shift = 0; ((u | v) & 1) == 0; ++shift) {
	u >>= 1;
	v >>= 1;

    while ((u & 1) == 0)
	u >>= 1;

    do {
	while ((v & 1) == 0)
	    v >>= 1;

	if (u > v) {
	    t = v;
	    v = u;
	    u = t;
	v = v - u;
    } while (v != 0);

    return (long)(u << shift);

This is the fastest implementation from the benchmarks. (I originally, wrongly, cited
the implementation in the article, which is 4|5th fastest in benchmarks, but
still almost 2x faster than the Ruby implementation.)

// based on wikipedia's article, 
// fixed by D. Lemire,  K. Willets
unsigned int gcdwikipedia7fast32(unsigned int u, unsigned int v)
     int shift, uz, vz;
     if ( u == 0) return v;
     if ( v == 0) return u;
     uz = __builtin_ctz(u);
     vz = __builtin_ctz(v);
     shift = uz > vz ? vz : uz;
     u >>= uz;
     do {
       v >>= vz;
       int diff = v;
       diff -= u;
       if ( diff == 0 ) break;
       vz = __builtin_ctz(diff);
       if ( v <  u ) u = v;
       v = abs(diff);
     } while( 1 );
     return u << shift;

The key to speeding up all the algorithms is using the __builtin_ctz(x) directive
to determine the number of trailing binary '0's.


rational.c.patch (1.22 KB) rational.c.patch gcd ahorek (Pavel Rosický), 12/28/2018 06:32 PM

Also available in: Atom PDF
