Project

General

Profile

Actions

Feature #20163

open

Introduce #bit_count method on Integer

Added by garrison (Garrison Jensen) over 1 year ago. Updated 8 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:116083]

Description

This feature request is to implement a method called #bit_count on Integer that returns the number of ones in the binary representation of the absolute value of the integer.

n = 19
n.bit_count #=> 3
(-n).bit_count #=> 3

This is often useful when you use an integer as a bitmask and want to count how many bits are set.

This would be equivalent to

n.to_s(2).count("1")

However, this can be outperformed by

def bit_count(n)
  count = 0
  while n > 0
    n &= n - 1 # Flip the least significant 1 bit to 0
    count += 1
  end
  count
end

I think this would be a useful addition because it would fit alongside the other bit-related methods defined on integer: #bit_length, #allbits?, #anybits?, #nobits?. Also, when working with bitmasks, a minor upgrade to performance often results in a significant improvement.

Similar methods from other languages:
https://docs.python.org/3/library/stdtypes.html#int.bit_count
https://doc.rust-lang.org/std/primitive.i32.html#method.count_ones


Related issues 1 (0 open1 closed)

Is duplicate of Ruby - Feature #8748: Integer#popcount (Fixnum#popcount and Bignum#popcount)RejectedActions

Updated by Eregon (Benoit Daloze) over 1 year ago

The name sounds too close to #bit_length, and length and count are often quite close in Ruby
(e.g. Enumerable#count without arguments is the same as #size/#length after Enumerable#to_a or on an Array, Hash, etc).

I think count_ones is a better name because there is no ambiguity.

popcount might be another common name for it, but that seems less clear unless the term or the assembly instruction is already known.
Also I think abbreviations are in general suboptimal in method names.

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

count_ones could be something different from what the OP wants, because (-19).bit_count is expected to be 3.

PS. i64::count_ones(-19) is 62 for Rust.
PPS. There is no such thing like a 64bit signed integer in ruby.

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

GCC has __builtin_popcount and Ruby defines rb_popcount function family internally.

Updated by byroot (Jean Boussier) over 1 year ago

I also think popcount makes sense. Yes it's a bit of a cryptic name, but if you are dealing with bits, you are likely to be familiar with that name.

Actions #5

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

  • Is duplicate of Feature #8748: Integer#popcount (Fixnum#popcount and Bignum#popcount) added

Updated by garrison (Garrison Jensen) over 1 year ago

I like popcount but I also like count_ones because it sounds friendlier, however I have no strong preferences for the chosen name.

Also, I want to share my performance tests in case they are helpful for the discussion. As you can see, the built-in method is significantly faster.

(0..10_000_000).each { |x| x.to_s(2).count('1') }
processing time: 3.714706s

(0..10_000_000).each { |x| ruby_pop_count(x) }
processing time: 3.367775s

(0..10_000_000).each { |x| x.pop_count }
processing time: 0.515767s

Here are my implementations:

def ruby_pop_count(n)
  n = n.abs
  count = 0
  while n > 0
    n &= n - 1 
    count += 1
  end
  count
end
unsigned int pop_count(long value) {
#ifdef __GNUC__
    // Use GCC built-in
    return __builtin_popcountl(value);
#else
    // Fallback method for compilers without the built-in
    unsigned int count = 0;
    while (value) {
        count += value & 1;
        value >>= 1;
    }
    return count;
#endif
}

// Wrapper function for Ruby
VALUE rb_pop_count(VALUE self) {
    long value = NUM2LONG(self);
    unsigned int count =  pop_count(labs(value));
    return UINT2NUM(count);
}

Updated by tenderlovemaking (Aaron Patterson) about 1 month ago

I would also like a popcount method. I prefer popcount over bit_count. I wrote my own here (but it only deals with single bytes).

Updated by akr (Akira Tanaka) about 1 month ago

What is the behavior for negative values?

The proposal describes two implenentations that returns different values.

p (-5).to_s(2).count("1") #=> 2

def bit_count(n)
  count = 0
  while n > 0
    n &= n - 1 # Flip the least significant 1 bit to 0
    count += 1
  end
  count
end
p bit_count(-5) #=> 0

Updated by byroot (Jean Boussier) about 1 month ago

What is the behavior for negative values?

IMO, the only behavior that makes sense in the context of a language with arbitrary size integers is to ignore the sign bit.

Updated by mame (Yusuke Endoh) about 1 month ago

What are the intended use cases for this proposal?

My experience (in other languages) involves two use cases of popcount:

  • Bitboards for game AI (like Reversi) to count pieces.
  • Succinct data structures (like LOUDS Tries) for rank operations.

In both scenarios, integers are treated as unsigned bitsets.

Does anyone have a use case where popcount on a negative number is necessary?
If not, I guess raising an exception would be the best behavior.

Updated by tompng (tomoya ishida) about 1 month ago

I also think popcount of a negative number should raise error because of the ambiguity.
One way to extend popcount to negative number is using a relationship below, derived from the fact that -5 == 0b111...11111011 has 1 fewer bits compared to -1 == 0b111...11111111.

x.popcount == popcount_of_minus_one - (~x).popcount
# popcount_of_minus_one: 0 or -1 or something else representing infinity bits

I'm not sure if this definition is useful, but anyway, extending to negative number has ambiguity.
If someone want a popcount of abs, n.abs.popcount is clearer.

Updated by ahorek (Pavel Rosický) about 1 month ago

x64 and ARM have specialized CPU instructions
https://godbolt.org/z/xvGvzsvd9

and Ruby already uses it internally, for instance
https://github.com/ruby/ruby/blob/dc555a48e750b4d50eb7a7000ca1bfb927fa9459/string.c#L2209

That said, Ruby isn’t the ideal choice for implementing memory allocators, SIMD masks, parity checks, GCD calculations, UTF parsers, or prime sieving… but since many other languages, even Python, provide support for popcount, why not?

Updated by byroot (Jean Boussier) about 1 month ago

but since many other languages, even Python, provide support for popcount, why not?

Usually a higher bar than that is required for a new method to be added to Ruby.

I personally don't have an immediate use case to point to (except for Aaron's gem of course). But more generally, in recent years we've tried to eliminate or at least reduce the need for C extensions, and bit operation are often useful in these cases, so I'm sympathetic to more bit oriented capacities.

Updated by tenderlovemaking (Aaron Patterson) about 1 month ago

mame (Yusuke Endoh) wrote in #note-10:

What are the intended use cases for this proposal?

My experience (in other languages) involves two use cases of popcount:

  • Bitboards for game AI (like Reversi) to count pieces.
  • Succinct data structures (like LOUDS Tries) for rank operations.

In both scenarios, integers are treated as unsigned bitsets.

My experience is similar. I've used it for sets (like I linked above) as well as modeling undirected graphs (a bit matrix, but I omitted popcount from the blog post). I've only used unsigned integers, and I think it would be a bug in my code if the integers were signed.

Does anyone have a use case where popcount on a negative number is necessary?
If not, I guess raising an exception would be the best behavior.

I agree, and I think it should raise an exception.

ahorek (Pavel Rosický) wrote in #note-12:

That said, Ruby isn’t the ideal choice for implementing memory allocators, SIMD masks, parity checks, GCD calculations, UTF parsers, or prime sieving…

Not yet! But hopefully someday! 😂

Updated by garrison (Garrison Jensen) about 1 month ago

Python ignores the sign. It seems friendlier to match that behavior than throw an exception.

(-x).popcount == x.popcount

Updated by tenderlovemaking (Aaron Patterson) about 1 month ago

garrison (Garrison Jensen) wrote in #note-15:

Python ignores the sign. It seems friendlier to match that behavior than throw an exception.

(-x).popcount == x.popcount

When would you use a negative number unless it's a mistake in your code?

Updated by tenderlovemaking (Aaron Patterson) about 1 month ago

It seems like the Python folks didn't have too serious a discussion about handling negative numbers.

https://github.com/python/cpython/issues/74068#issuecomment-1093743975
https://github.com/python/cpython/issues/74068#issuecomment-1093743978

I prefer it raises an exception as it's probably a mistake in the code. 🤷🏻‍♀️

Updated by garrison (Garrison Jensen) about 1 month ago

tenderlovemaking (Aaron Patterson) wrote in #note-16:

When would you use a negative number unless it's a mistake in your code?

I don't have a strong argument. Raising an exception sounds good to me.

Updated by akr (Akira Tanaka) about 1 month ago

I prefer an exception for popcount to negative values.

I think an array of Fixnums (63 bit signed integers) can be used for mutable bit array.
(Ruby's Integer is immutable. So mutable bit array needs a mutable data structure.)

In this scenario, we would like to count bits in a negative fixnum: (n & 0x7ffffffff).popcount.
This is not n.abs.popcount.
Also, this behavior is not suitable for Integer#popcount because it ignores bits higher than 63th bit.

So, I prefer an exception for negative values.

Updated by slewsys (Andrew Moore) 19 days ago

garrison (Garrison Jensen) wrote in #note-6:

unsigned int pop_count(long value) {
#ifdef __GNUC__
    // Use GCC built-in
    return __builtin_popcountl(value);
#else
    // Fallback method for compilers without the built-in
    unsigned int count = 0;
    while (value) {
        count += value & 1;
        value >>= 1;
    }
    return count;
#endif
}

// Wrapper function for Ruby
VALUE rb_pop_count(VALUE self) {
    long value = NUM2LONG(self);
    unsigned int count =  pop_count(labs(value));
    return UINT2NUM(count);
}

Section 1.8 of Matters Computational by Jörg Arndt offers efficient algorithms for popcount, including for arrays. There is also an x86 instruction POPCNT.

Matters Computational: https://www.jjj.de/fxt/fxtbook.pdf

E.g.,

unsigned long popcount (unsigned long value) {
    value -= (value >> 1) & 0x5555555555555555UL;
    value = ((value >> 2) & 0x3333333333333333UL) + (value & 0x3333333333333333UL);
    value = ((value >> 4) + value) & 0x0f0f0f0f0f0f0f0fUL;
    value *= 0x0101010101010101UL;
    return value >> 56;
}

By the way, "popcount" is short for "population count", whereas "pop_count" suggests (to me) a stack operation - a bit (hm) confusing.

Updated by YO4 (Yoshinao Muramatsu) 12 days ago

We can use Integer#[] to negative numbers.

-1[0,8].popcount # => 8
-2[0,8].popcount # => 7
-3[0,8].popcount # => 7

In computers that use two's complement for negative numbers, I think this is expected behavior.

If generating sliced values becomes an overhead, we can add arguments to popcount.

Updated by mame (Yusuke Endoh) 11 days ago

In computers that use two's complement for negative numbers, I think this is expected behavior.

Yes, but do you want -1.popcount to return Float::INFINITY? That might be theoretically consistent, but useless and even problematic, I think.

Updated by YO4 (Yoshinao Muramatsu) 8 days ago

Popcount for negative numbers is useful when equivalent functionality to machine register operations is desired, but I do not consider operations on infinite bit widths to be useful.

When popcount is called on a negative number (without specifying a bit width if popcount has arguments), it may indicate abnormal input or a bug in user code.
I am +1 for raising exception on that case.

Actions

Also available in: Atom PDF

Like1
Like1Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like1Like0Like0Like0Like0Like0Like0