[RFC] Data::Endian (proposal for module to read/write big-endian floats/doubles)

am 03.07.2007 23:26:50 von t.x.michaels

Hello all - I am considering creating a simple module which allows a
user to read/write big-endian floats or doubles.
This is a relatively simple task which I had to solve for myself, and
I thought I would save someone from re-inventing the same wheel.

I did not find on CPAN anything which suited me. A user would do the
following (from either a little-endian
or big-endian architecture):

seek($data_file, $byte_position, 0);

#Read a 4-byte binary big-endian single precision floating point
number
read($data_file, $SPF_binary_number, 4);

#Convert to the native architecture floating point format
$float = read_float($SPF_binary_number);

The "read_float" subroutine is as follows:

#Receives a four-byte binary chunk containing a floating point number
# if platform is little-endian - do byte swapping
# if big-endian - just unpack as float
sub read_float {
my $SPF_binary_number = shift;
my $arch_is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
my $float;

if ( $arch_is_little_endian ) {
my $bit_string = unpack("B32", $SPF_binary_number);
my ($b0, $b1, $b2, $b3) = unpack("A8" x 4, $bit_string);
my $flipped_bit_string = $b3 . $b2 . $b1 . $b0;
$float = unpack("f", pack("B32", $flipped_bit_string));
}
else { #Big-endian
$float = unpack("f", $SPF_binary_number);
}
return $float;
}

There would be a similar subroutine for reading "doubles". Two more
subroutines would
write out a float or double in big-endian format.

Any and all comments are greatly appreciated.
It would probably be easier for me if replies were also sent to
t.x.michaels AT gmail.com

Thanks,
Terry Michaels

Re: [RFC] Data::Endian (proposal for module to read/write big-endian floats/doubles)

am 04.07.2007 09:45:28 von Kalle Olavi Niemitalo

"t.x.michaels@gmail.com" writes:

> Hello all - I am considering creating a simple module which allows a
> user to read/write big-endian floats or doubles.

This reminds me of a glibc bug report and a related thread:

http://sourceware.org/bugzilla/show_bug.cgi?id=4586
http://sourceware.org/ml/libc-alpha/2007-06/msg00001.html

So it may be risky to unpack long doubles from untrusted data.

> my $bit_string = unpack("B32", $SPF_binary_number);
> my ($b0, $b1, $b2, $b3) = unpack("A8" x 4, $bit_string);
> my $flipped_bit_string = $b3 . $b2 . $b1 . $b0;
> $float = unpack("f", pack("B32", $flipped_bit_string));

I wonder if reverse($SFP_binary_number) could be used here.

Re: [RFC] Data::Endian (proposal for module to read/write big-endian floats/doubles)

am 07.07.2007 21:44:31 von hjp-usenet2

On 2007-07-03 21:26, t.x.michaels@gmail.com wrote:
> Hello all - I am considering creating a simple module which allows a
> user to read/write big-endian floats or doubles.

The name "Data::Endian" doesn't reflect that purpose.

> The "read_float" subroutine is as follows:
>
> #Receives a four-byte binary chunk containing a floating point number
> # if platform is little-endian - do byte swapping
> # if big-endian - just unpack as float
> sub read_float {
> my $SPF_binary_number = shift;
> my $arch_is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
> my $float;
>
> if ( $arch_is_little_endian ) {
> my $bit_string = unpack("B32", $SPF_binary_number);
> my ($b0, $b1, $b2, $b3) = unpack("A8" x 4, $bit_string);
> my $flipped_bit_string = $b3 . $b2 . $b1 . $b0;
> $float = unpack("f", pack("B32", $flipped_bit_string));

That seems like a an awfully complicated way to me. How about:

$float = unpack("f", pack("L", unpack("N", $SPF_binary_number)));

> }
> else { #Big-endian
> $float = unpack("f", $SPF_binary_number);
> }
> return $float;
> }

In fact, the oneliner above works on both little- and big-endian
machines (the pack("L", unpack("N", ...)) sequence takes care of that)
and is almost four times faster than your version on my computer (an
Intel Core2) and still about twice as fast on two big-endian machines I
tried (a PA-RISC and a SPARC machine).

#!/usr/bin/perl
use warnings;
use strict;
use Config;
use Benchmark qw(:all);

sub read_float {
my $SPF_binary_number = shift;
my $arch_is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
my $float;

if ( $arch_is_little_endian ) {
my $bit_string = unpack("B32", $SPF_binary_number);
my ($b0, $b1, $b2, $b3) = unpack("A8" x 4, $bit_string);
my $flipped_bit_string = $b3 . $b2 . $b1 . $b0;
$float = unpack("f", pack("B32", $flipped_bit_string));
}
else { #Big-endian
$float = unpack("f", $SPF_binary_number);
}
return $float;
}

sub read_float2 {
my $SPF_binary_number = shift;
my $arch_is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
my $float;

if ( $arch_is_little_endian ) {
$float = unpack("f", pack("L", unpack("N", $SPF_binary_number)));
}
else { #Big-endian
$float = unpack("f", $SPF_binary_number);
}
return $float;
}

sub read_float3 {
my $SPF_binary_number = shift;
my $float;

if ( substr($Config{byteorder}, 0, 1) eq '1' ) {
$float = unpack("f", pack("L", unpack("N", $SPF_binary_number)));
}
else { #Big-endian
$float = unpack("f", $SPF_binary_number);
}
return $float;
}

sub read_float4 {
my $SPF_binary_number = shift;
return unpack("f", pack("L", unpack("N", $SPF_binary_number)));
}

sub dummy {
my $SPF_binary_number = shift;
return 1.0;
}

cmpthese(-3,
{
michaels => sub { my $float = read_float('1234') },
use_L_N => sub { my $float = read_float2('1234') },
use_Config => sub { my $float = read_float3('1234') },
simple => sub { my $float = read_float4('1234') },
dummy => sub { my $float = dummy('1234') },
}
);
__END__

Results on an Intel Core2:

Rate michaels use_Config use_L_N simple dummy
michaels 129505/s -- -12% -55% -80% -90%
use_Config 147082/s 14% -- -49% -77% -88%
use_L_N 287865/s 122% 96% -- -55% -77%
simple 633616/s 389% 331% 120% -- -50%
dummy 1260230/s 873% 757% 338% 99% --

Results on an UltraSparc IIi:

Rate use_Config michaels use_L_N simple dummy
use_Config 6913/s -- -57% -58% -80% -89%
michaels 16006/s 132% -- -3% -54% -76%
use_L_N 16432/s 138% 3% -- -53% -75%
simple 34941/s 405% 118% 113% -- -47%
dummy 65749/s 851% 311% 300% 88% --

(also note that the call overhead is about 100% - as can be seen by
comparing the times of simple and dummy)

Given that your problem can be solved in a single line of perl I don't
really see the need for module to do that.

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"

Re: Data::Endian (proposal for module to read/write big-endian floats/doubles)

am 08.07.2007 23:37:54 von t.x.michaels

On Jul 7, 3:44 pm, "Peter J. Holzer" wrote:
> On 2007-07-03 21:26, t.x.micha...@gmail.com wrote:
>
> > Hello all - I am considering creating a simple module which allows a
> > user to read/write big-endian floats or doubles.
>
> The name "Data::Endian" doesn't reflect that purpose.
>
> > The "read_float" subroutine is as follows:
>
> > #Receives a four-byte binary chunk containing a floating point number
> > # if platform is little-endian - do byte swapping
> > # if big-endian - just unpack as float
> > sub read_float {
> > my $SPF_binary_number = shift;
> > my $arch_is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
> > my $float;
>
> > if ( $arch_is_little_endian ) {
> > my $bit_string = unpack("B32", $SPF_binary_number);
> > my ($b0, $b1, $b2, $b3) = unpack("A8" x 4, $bit_string);
> > my $flipped_bit_string = $b3 . $b2 . $b1 . $b0;
> > $float = unpack("f", pack("B32", $flipped_bit_string));
>
> That seems like a an awfully complicated way to me. How about:
>
> $float = unpack("f", pack("L", unpack("N", $SPF_binary_number)));
>
> > }
> > else { #Big-endian
> > $float = unpack("f", $SPF_binary_number);
> > }
> > return $float;
> > }
>
> In fact, the oneliner above works on both little- and big-endian
> machines (the pack("L", unpack("N", ...)) sequence takes care of that)
> and is almost four times faster than your version on my computer (an
> Intel Core2) and still about twice as fast on two big-endian machines I
> tried (a PA-RISC and a SPARC machine).
>
> #!/usr/bin/perl
> use warnings;
> use strict;
> use Config;
> use Benchmark qw(:all);
>
> sub read_float {
> my $SPF_binary_number = shift;
> my $arch_is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
> my $float;
>
> if ( $arch_is_little_endian ) {
> my $bit_string = unpack("B32", $SPF_binary_number);
> my ($b0, $b1, $b2, $b3) = unpack("A8" x 4, $bit_string);
> my $flipped_bit_string = $b3 . $b2 . $b1 . $b0;
> $float = unpack("f", pack("B32", $flipped_bit_string));
> }
> else { #Big-endian
> $float = unpack("f", $SPF_binary_number);
> }
> return $float;
>
> }
>
> sub read_float2 {
> my $SPF_binary_number = shift;
> my $arch_is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
> my $float;
>
> if ( $arch_is_little_endian ) {
> $float = unpack("f", pack("L", unpack("N", $SPF_binary_number)));
> }
> else { #Big-endian
> $float = unpack("f", $SPF_binary_number);
> }
> return $float;
>
> }
>
> sub read_float3 {
> my $SPF_binary_number = shift;
> my $float;
>
> if ( substr($Config{byteorder}, 0, 1) eq '1' ) {
> $float = unpack("f", pack("L", unpack("N", $SPF_binary_number)));
> }
> else { #Big-endian
> $float = unpack("f", $SPF_binary_number);
> }
> return $float;
>
> }
>
> sub read_float4 {
> my $SPF_binary_number = shift;
> return unpack("f", pack("L", unpack("N", $SPF_binary_number)));
>
> }
>
> sub dummy {
> my $SPF_binary_number = shift;
> return 1.0;
>
> }
>
> cmpthese(-3,
> {
> michaels => sub { my $float = read_float('1234') },
> use_L_N => sub { my $float = read_float2('1234') },
> use_Config => sub { my $float = read_float3('1234') },
> simple => sub { my $float = read_float4('1234') },
> dummy => sub { my $float = dummy('1234') },
> }
> );
> __END__
>
> Results on an Intel Core2:
>
> Rate michaels use_Config use_L_N simple dummy
> michaels 129505/s -- -12% -55% -80% -90%
> use_Config 147082/s 14% -- -49% -77% -88%
> use_L_N 287865/s 122% 96% -- -55% -77%
> simple 633616/s 389% 331% 120% -- -50%
> dummy 1260230/s 873% 757% 338% 99% --
>
> Results on an UltraSparc IIi:
>
> Rate use_Config michaels use_L_N simple dummy
> use_Config 6913/s -- -57% -58% -80% -89%
> michaels 16006/s 132% -- -3% -54% -76%
> use_L_N 16432/s 138% 3% -- -53% -75%
> simple 34941/s 405% 118% 113% -- -47%
> dummy 65749/s 851% 311% 300% 88% --
>
> (also note that the call overhead is about 100% - as can be seen by
> comparing the times of simple and dummy)
>
> Given that your problem can be solved in a single line of perl I don't
> really see the need for module to do that.
>
> hp
>
> --
> _ | Peter J. Holzer | I know I'd be respectful of a pirate
> |_|_) | Sysadmin WSR | with an emu on his shoulder.
> | | | h...@hjp.at |
> __/ |http://www.hjp.at/| -- Sam in "Freefall"

Peter,

Thank you for your insightful commentary, it is very much
appreciated.

Your solution for converting single precision floating point numbers
between big and little endian formats is certainly
faster and easier than the one I presented.

It would appear that the solution presented, would be limited to
single precision (32-bit) floating point numbers.

What would you suggest for the double precision (64-bit) floating
point number conversions?

My solution would have been pretty much the same as the one I
offered for the SPF numbers, extended to 64-bits
(byte-swapping and unpacking as double).

Again, thank you for taking the time to share your insights, it is
really very much appreciated.

Sincerely,
Terry Michaels

Re: Data::Endian (proposal for module to read/write big-endian floats/doubles)

am 09.07.2007 10:20:54 von hjp-usenet2

On 2007-07-08 21:37, t.x.michaels@gmail.com wrote:
> On Jul 7, 3:44 pm, "Peter J. Holzer" wrote:
>> On 2007-07-03 21:26, t.x.micha...@gmail.com wrote:
>>
>> > Hello all - I am considering creating a simple module which allows a
>> > user to read/write big-endian floats or doubles.
>>
>> The name "Data::Endian" doesn't reflect that purpose.
>>
>> > The "read_float" subroutine is as follows:
>>
>> > #Receives a four-byte binary chunk containing a floating point number
>> > # if platform is little-endian - do byte swapping
>> > # if big-endian - just unpack as float
>> > sub read_float {
>> > my $SPF_binary_number = shift;
>> > my $arch_is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
>> > my $float;
>>
>> > if ( $arch_is_little_endian ) {
>> > my $bit_string = unpack("B32", $SPF_binary_number);
>> > my ($b0, $b1, $b2, $b3) = unpack("A8" x 4, $bit_string);
>> > my $flipped_bit_string = $b3 . $b2 . $b1 . $b0;
>> > $float = unpack("f", pack("B32", $flipped_bit_string));
>>
>> That seems like a an awfully complicated way to me. How about:
>>
>> $float = unpack("f", pack("L", unpack("N", $SPF_binary_number)));
[...]
>>
>> In fact, the oneliner above works on both little- and big-endian
>> machines (the pack("L", unpack("N", ...)) sequence takes care of that)
>> and is almost four times faster than your version on my computer (an
>> Intel Core2) and still about twice as fast on two big-endian machines I
>> tried (a PA-RISC and a SPARC machine).

[ benchmark program and results snipped: Please quote only relevant
material]

>> Given that your problem can be solved in a single line of perl I don't
>> really see the need for module to do that.
>>
>
> Thank you for your insightful commentary, it is very much
> appreciated.
>
> Your solution for converting single precision floating point numbers
> between big and little endian formats is certainly
> faster and easier than the one I presented.
>
> It would appear that the solution presented, would be limited to
> single precision (32-bit) floating point numbers.
>
> What would you suggest for the double precision (64-bit) floating
> point number conversions?

There doesn't seem to be a 64bit equivalent to the "N" specifier for
pack and unpack. So you are right, for double precision numbers the byte
swapping has to be done explicitely.

That would be be something like

sub read_double {
my ($DPF_binary_number) = @_;
if (pack('s', 1) eq "\1\0") {
$DPF_binary_number = reverse($DPF_binary_number);
}
return unpack("d", $DPF_binary_number);
}

That can still be crammed into one line:

unpack("d", pack('s', 1) eq "\1\0" ? reverse($DPF_binary_number) : $DPF_binary_number);

but I agree that it's getting a bit unwieldy.

So to summarize my thoughts:

* All the functions you provide are one-liners.

* There's nothing wrong with a module which provides a bunch of one-line
functions, as long as the best implementation isn't obvious (and I
think you proved that it isn't obvious :-)).

* So there may be some merit in your module.

* However, as I said before, it needs a better name. "Data::Endian" is
just too generic. I would expect to be able to read and write many
different data types (not just FP) from and to both little-endian and
big-endian datastreams with a module of that name. If your module
provides just for big-endian FP data, it should have "big-endian" and
"floating-point" in its name.

hp

PS: I would like to see the support for 32-bit and 64-bit IEEE-754
numbers in both endiannesses (or maybe just network order) in pack
and unpack.

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"