How random is random?

am 11.05.2007 21:12:47 von cbigam

In the context of random numbers, how does one measure (or even properly
describe) randomness?

Recently we needed to generate several large files of random data for
test purposes. Since we're working in Unixland, it was easy enough to
do:

$ dd if=/dev/random of= bs=x count=y conv=sync.

Now assuming that we keep the filesize the same (i.e. x*y=constant),
the time to generate files goes up as count increases and bs decreases.
The interesting thing is that files created with low count and high bs...
- compress much better
- generate far fewer lines (as measured by wc -l)

Now since compress and gzip are apparently entropy-based algorithms, it
stands to reason (at least by me!) that the small-count file has less
entropy. The question is, what does this actually mean, and what are the
consequences of it?

Thanks,
Colin

Re: How random is random?

am 11.05.2007 22:36:42 von ottomeister

On May 11, 12:12 pm, "Colin B."
wrote:

> $ dd if=/dev/random of= bs=x count=y conv=sync.
>
> Now assuming that we keep the filesize the same (i.e. x*y=constant),
> the time to generate files goes up as count increases and bs decreases.
> The interesting thing is that files created with low count and high bs...
> - compress much better
> - generate far fewer lines (as measured by wc -l)
>
> Now since compress and gzip are apparently entropy-based algorithms, it
> stands to reason (at least by me!) that the small-count file has less
> entropy. The question is, what does this actually mean, and what are the
> consequences of it?

'conv=sync' tells 'dd' that if it gets a short read from its input
then it
should pad the output record to the specified blocksize with zeroes.
/dev/random can produce short reads if its entropy pool gets depleted.
If you examine the compressible output files I expect you'll find
that
they contain lots of runs of zeroes, and those runs of zeroes are
highly compressible.

This is also the reason why the large 'bs' causes the file to be
generated more quickly.

OttoM.
__
ottomeister

Disclaimer: These are my opinions. I do not speak for my employer.

Re: How random is random?

am 11.05.2007 23:15:11 von cbigam

ottomeister@mail.com wrote:
> On May 11, 12:12 pm, "Colin B."
> wrote:
>
>> $ dd if=/dev/random of= bs=x count=y conv=sync.
>>
>> Now assuming that we keep the filesize the same (i.e. x*y=constant),
>> the time to generate files goes up as count increases and bs decreases.
>> The interesting thing is that files created with low count and high bs...
>> - compress much better
>> - generate far fewer lines (as measured by wc -l)
>>
>> Now since compress and gzip are apparently entropy-based algorithms, it
>> stands to reason (at least by me!) that the small-count file has less
>> entropy. The question is, what does this actually mean, and what are the
>> consequences of it?
>
> 'conv=sync' tells 'dd' that if it gets a short read from its input
> then it
> should pad the output record to the specified blocksize with zeroes.
> /dev/random can produce short reads if its entropy pool gets depleted.
> If you examine the compressible output files I expect you'll find
> that
> they contain lots of runs of zeroes, and those runs of zeroes are
> highly compressible.
>
> This is also the reason why the large 'bs' causes the file to be
> generated more quickly.

Ah hah! That explains some other behaviour I noticed after posting this,
namely that until a certain point, increasing bs (and decreasing count)
didn't seem to produce the behaviour I described.

Now that I actaully look at the output from dd, I can see the same thing--
0 full records and count partial records if bs is high enough (> 1040,
in this case).

But shouldn't /dev/random (on Solaris, BTW) block until it can fill the
request for whatever block size? Or can it only block between calls?

Thanks,
Colin

Re: How random is random?

am 12.05.2007 00:54:04 von ottomeister

On May 11, 2:15 pm, "Colin B."
wrote:
> ottomeis...@mail.com wrote:
> > On May 11, 12:12 pm, "Colin B."
> > wrote:
>
> >> $ dd if=/dev/random of= bs=x count=y conv=sync.
>
> Ah hah! That explains some other behaviour I noticed after posting this,
> namely that until a certain point, increasing bs (and decreasing count)
> didn't seem to produce the behaviour I described.
>
> Now that I actaully look at the output from dd, I can see the same thing--
> 0 full records and count partial records if bs is high enough (> 1040,
> in this case).
>
> But shouldn't /dev/random (on Solaris, BTW) block until it can fill the
> request for whatever block size? Or can it only block between calls?

The /dev/random device should block unless the app
has gone out of its way to turn on non-blocking behaviour.
It's unlikly that 'dd' would do that; certainly the Solaris
'dd' doesn't. The old Solaris /dev/random pipe, fed by
the cryptorand daemon, could return short reads just
like any other named pipe.

But in fact I was wrong, what's happening here has
nothing to do with blocking or entropy depletion. You're
being bitten by an undocumented quirk of the Solaris
/dev/random driver, which is that the amount of data
it will deliver in response to a single read() is capped.
That cap happens to be 1040 bytes.

I suppose it's within the driver's rights to do that but
when it's mixed with 'dd' like this the result is quite
unpleasant. There are easy workarounds once you
know what's happening (e.g. keep 'bs' below 1040,
or pipe the output of 'cat /dev/random' into 'dd', or
don't use 'dd') but before you can do that you have
to actually notice that something is broken. 'dd'
does tell you, in its own cryptic fashion, that the
input records were incomplete. I doubt I'd have
spotted that.

OttoM.
__
ottomeister

Disclaimer: These are my opinions. I do not speak for my employer.

Re: How random is random?

am 13.05.2007 01:16:04 von Ertugrul Soeylemez

"Colin B." (07-05-11 19:12:47):

> $ dd if=3D/dev/random of=3D bs=3Dx count=3Dy conv=3Dsync.
>
> Now assuming that we keep the filesize the same (i.e. x*y=3Dconstant),
> the time to generate files goes up as count increases and bs
> decreases.

This is a buffering issue and has nothing much to do with the PRNG.

> The interesting thing is that files created with low count
> and high bs...
> - compress much better
> - generate far fewer lines (as measured by wc -l)

Then the PRNG in Solaris is messed up. Regardless of dd's transfer
block size, the compression rate should always be the same: around 0%.
Especially since /dev/random is supposed to be a character device, there
should be no difference.

> Now since compress and gzip are apparently entropy-based algorithms, it
> stands to reason (at least by me!) that the small-count file has less
> entropy. The question is, what does this actually mean, and what are the
> consequences of it?

That Solaris' PRNG is messed up, or that you did something wrong.

Regards,
Ertugrul SÃ¶ylemez.

--=20
Security is the one concept, which makes things in your life stay as
they are. Otto is a man, who is afraid of changes in his life; so
naturally he does not employ security.

Re: How random is random?

am 13.05.2007 22:28:50 von unruh

"Colin B." writes:

>In the context of random numbers, how does one measure (or even properly
>describe) randomness?

>Recently we needed to generate several large files of random data for
>test purposes. Since we're working in Unixland, it was easy enough to
>do:

>$ dd if=/dev/random of= bs=x count=y conv=sync.

Really terrible idea. Use /dev/urandom. //dev/random only has a finite
amount of random data and blocks if it runs out.

>Now assuming that we keep the filesize the same (i.e. x*y=constant),
>the time to generate files goes up as count increases and bs decreases.
>The interesting thing is that files created with low count and high bs...
> - compress much better
> - generate far fewer lines (as measured by wc -l)

??? NOt at all sure what dd is doing. It may return if /dev/random blocks.
In which case the output will not be random.

>Now since compress and gzip are apparently entropy-based algorithms, it
>stands to reason (at least by me!) that the small-count file has less
>entropy. The question is, what does this actually mean, and what are the
>consequences of it?

>Thanks,
>Colin

Re: How random is random?

am 14.05.2007 18:17:26 von cbigam

Unruh wrote:
> "Colin B." writes:
>
>>In the context of random numbers, how does one measure (or even properly
>>describe) randomness?
>
>>Recently we needed to generate several large files of random data for
>>test purposes. Since we're working in Unixland, it was easy enough to
>>do:
>
>>$ dd if=/dev/random of= bs=x count=y conv=sync.
>
> Really terrible idea. Use /dev/urandom. //dev/random only has a finite
> amount of random data and blocks if it runs out.

Well, that's a matter of necessity. /dev/urandom won't block if it runs
out of entropy, which may not be acceptable in some cases. (In our case,
it's just fine, though.)

Furthermore, both /dev/random and /dev/urandom seem to have a hard cap
of 1040 bytes. Any blocksize beyond that for either device gives a short
read.

On the other hand, I can stream from /dev/[u]random, as per the following:

$ cat /dev/urandom | dd of= bs=x count=y.

With /dev/urandom, I can increase bs arbitrarily, and it keeps feeding
data almost without issue, although in theory the randomness of the data
may suffer. Also, once in a while I'll get a short read--usually when I'm
doing something else that's moderately intensive.
On the other hand, with /dev/random, I'll get one or two complete reads,
then a partial read and the loop will block for a significant fraction of
a second, before getting the next (complete) read. This assumes that I'm
using a bs>1040. Below that, I can't get a partial block returned no
matter what device I'm using and what else is going on on the system.

> ??? NOt at all sure what dd is doing. It may return if /dev/random blocks.
> In which case the output will not be random.

Pretty much what was happening. /dev/random was only providing 1040 bytes
of random data, and conv=sync was zero-filling the rest.

It looks to me like keeping bs below 1041 (1k seems nice) and checking the
output file size (without conv=sync) are the only guarantees of getting a
file of a given size with truly random data.

Colin