cabo (Carsten Bormann) wrote:
pack("G")/unpack("G") works great with NaN values. However,
First, Ruby provides only Float::NAN
, and does not consider payloads of NaN values.
Right. The only interface Ruby provides to NaN values is via pack/unpack.
-
pack("g")
completely discards any actual NaN value and always packs the same bytes for a NaN
("bug as implemented" in VALUE_to_float
)
Since "G"
preserves the payload, it may be better to preserve it for "g"
as well, IMO.
Indeed.
The code in pack.c/VALUE_to_float for NaNs uses “return NAN” which discards the value.
This needs to be replaced with something like the “return d” further down in pack.c (which implies a conversion from Ruby’s double to the float we need for pack(‘g’), which also works for NaNs, with the detail below).
However, this simple fix is complicated by the fact that float⬌double conversions of NaN values always seem to set the quiet bit.
I slightly rewrote your demo program to demonstrate how the quiet bit can be restored after float⬌double conversion in either direction.
#include <float.h>
#include <stdint.h>
#include <stdio.h>
#define F32_QBIT 0x00400000
#define F64_QBIT_OFFSET 29 /* 32 more bits, minus 3 for more exponent */
#define F64_QBIT ((uint64_t)F32_QBIT << F64_QBIT_OFFSET)
uint32_t examples[2] = {0x7fbff000, 0x7ffff000};
int main(void)
{
union {
uint32_t b;
float f;
} f32;
union {
uint64_t b;
double d;
} f64;
uint32_t qbit;
for (int i = 0; i < 2; i++) {
f32.b = examples[i]; /* note quiet bit not set in first example*/
printf("setup f32 with %x == %f\n", f32.b, f32.f);
/* (1) Expand f32 to f64, as needed in unpack('g') */
f64.d = f32.f; /* quiet bit gets set here */
printf("C expands this to %llx == %f\n", f64.b, f64.d);
/* fix up f64 by copying the lost quiet bit from f32.b
Obviously, do this in NaN branch only.
*/
qbit = f32.b & F32_QBIT;
printf("qbit: %x\n", qbit);
f64.b = (f64.b & ~F64_QBIT) | ((uint64_t)qbit << F64_QBIT_OFFSET);
printf("qbit fixed to %llx == %f\n", f64.b, f64.d);
printf("\n");
/* (2) Contract f64 to f32, as needed in pack('g') */
f32.f = (float)f64.d;
printf("convert back to f32: %x == %f\n", f32.b, f32.f);
/* fix up f32 by copying the lost quiet bit from f64.b
Obviously, do this in NaN branch only.
*/
qbit = (f64.b >> F64_QBIT_OFFSET) & F32_QBIT;
printf("qbit: %x\n", qbit);
f32.b = (f32.b & ~F32_QBIT) | qbit;
printf("fixed this to %x == %f\n", f32.b, f32.f);
printf("\n\n");
}
return 0;
}
$ make non-sgl-dbl && ./non-sgl-dbl
...
setup f32 with 7fbff000 == nan
C expands this to 7ffffe0000000000 == nan
qbit: 0
qbit fixed to 7ff7fe0000000000 == nan
convert back to f32: 7ffff000 == nan
qbit: 0
fixed this to 7fbff000 == nan
setup f32 with 7ffff000 == nan
C expands this to 7ffffe0000000000 == nan
qbit: 400000
qbit fixed to 7ffffe0000000000 == nan
convert back to f32: 7ffff000 == nan
qbit: 400000
fixed this to 7ffff000 == nan