Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000859 [1003.1(2013)/Issue7+TC1] System Interfaces Comment Enhancement Request 2014-07-18 13:59 2016-07-14 15:13
Reporter tedu View Status public  
Assigned To
Priority normal Resolution Withdrawn  
Status Closed  
Name Ted Unangst
Organization OpenBSD
User Reference
Section posix_random
Page Number 0
Line Number 0
Interp Status ---
Final Accepted Text
Summary 0000859: Add posix_random family of interfaces
Description Cryptographic software requires a source of unpredictable (pseduo) random numbers. Various nonstandard system interfaces exist for this purpose, but they have subtle differences in behavior between platforms. Attempts to build reliable portable software often fail due to this variation.

The interfaces below are adapted from the arc4random family of interfaces first introduced in OpenBSD in 1996 and experiences gained since then. They have been renamed to be more standard like, although the OpenBSD project would not object to standardizing the existing names.
Desired Action SYNOPSIS
     #include <stdlib.h>

     uint32_t posix_random(void);

     void posix_random_buffer(void *buf, size_t nbytes);

     uint32_t posix_random_uniform(uint32_t upper_bound);

DESCRIPTION
     This family of functions provides higher quality data than those
     described in rand(3), random(3), and drand48(3). The generated numbers
     must be unpredictable. In particular, the sequence must not be shared
     between processes after fork().

     The arc4random() function returns a single 32-bit value.

     arc4random_buf() fills the region buf of length nbytes with random data.

     arc4random_uniform() will return a single 32-bit value, uniformly
     distributed but less than upper_bound. This is recommended over
     constructions like "arc4random() % upper_bound" as it avoids "modulo
     bias" when the upper bound is not a power of two.

     All of these functions are thread safe.

RETURN VALUES
     These functions are always successful, and no return value is reserved to
     indicate an error.

RATIONALE
     The standard does not specify a required algorithm, leaving implementations
     some flexibility so long as they meet the interface requirements.

     No mechanism is provided to seed or reseed these functions, which places
     an unnecessary burden on application developers. The implementation is
     responsible for ensuring correct operation at all times.
Tags No tags attached.
Attached Files

- Relationships
related to 0000743Closed 1003.1(2013)/Issue7+TC1 RAND_MAX should guarantee even distribution over a power of 2 
related to 0001134Closed 1003.1(2016/18)/Issue7+TC2 Add getentropy interface 

-  Notes
(0002314)
tedu (reporter)
2014-07-18 14:04

oops, I copied the man page too literally. The description should more properly read:

     The posix_random() function returns a single 32-bit value.

     posix_random_buffer() fills the region buf of length nbytes with
     random data.

     posix_random_uniform() will return a single 32-bit value, uniformly
     distributed but less than upper_bound. This is recommended over
     constructions like "posix_random() % upper_bound" as it avoids "modulo
     bias" when the upper bound is not a power of two.
(0002317)
mdempsky (reporter)
2014-07-18 17:35

I'm very supportive of this if it can be accepted. A couple notes though:

1. I think for most other posix_FOO() functions, there was often an existing FOO() function very similar in design/purpose. I'm a little concerned that naming these just posix_random(), etc. might lead people to think they're just a new API for the (non-cryptographically secure) rand() and/or random() functions. So it might be worth considering some alternative names like "posix_saferandom()", "posix_cryptrandom()", "posix_securerandom()", etc. (For reference, Windows provides a function "CryptGenRandom", and Java provides a "SecureRandom" class. In the cryptographic community the adjective "safe" is getting some use such as in safecurves.cr.yp.to.)

2. POSIX doesn't seem to use the term "quality" in describing the other RNG APIs, so it would seem a bit out of place to refer to it here. It might be better to directly address the requirement that the API provide random bytes suitable for cryptographic use.

  2b. It's theoretically of use to quantify the security level somehow (e.g., Salsa20 makes some specific claims about conjectured attack costs in http://cr.yp.to/snuffle/security.pdf), [^] but in the interest of simplicity, I'm inclined to leave that as a quality-of-implementation issue. Perhaps just require implementations to document their choice of RNG and the conjectured security levels?

3. This API doesn't explain that it uses a "sequence" of random values like the rand(), srand() or drand48() functions do, so saying "In particular, the sequence must not be shared between processes after fork()." is a bit out of place too I think. Either we should clarify that these APIs also use a sequence, or we should try to revise the wording to simply say that each invocation needs to generate new independent values, even after a fork(). Not sure how to word that off hand though; I keep wanting to say "unique", but that might be construed as implying that values aren't reused. I'll look around for some alternative wording options.

4. If this is added to <stdlib.h>, it needs to be shaded CX, since it's an extension to an ISO C header.
(0002318)
mdempsky (reporter)
2014-07-18 17:51

Oh a few more thoughts:

5. I agree that the functions should never return if they fail, but I fear asserting they're "always successful" might be dubious. We might want to instead specify something like if they fail, they should terminate abnormally as if by abort()?

6. Can/should the functions block to ensure sufficient seed entropy is available? If so, do we need to worry about thread cancellation and/or signal handling? I'm inclined to say they should block (i.e., the output should always be "safe"), signals received while blocking should *not* cause it to fail (i.e., not abort), and could go either way on thread cancellation.
(0002319)
tedu (reporter)
2014-07-18 18:22

For additional reference, there is also a rand_s function in Windows. It returns an error, but appears that the only defined error is passing a null pointer.

http://msdn.microsoft.com/en-us/library/sxtz2fa8.aspx [^]

(For the random_buffer() function described above, passing NULL or any other invalid pointer would result in a segfault; it does not perform such parameter checking.)
(0002320)
dalias (reporter)
2014-07-21 02:59

I strongly object to the addition of any interface that terminates the calling process as a consequence of failure. The only safe way to use such an interface is to fork each time you want to use it, and this is prohibitively slow and error-prone (e.g. you also need to exec if the original process may be multi-threaded, since calling AS-unsafe functions in the child after fork is UB for a multi-threaded parent) to the point of making the whole interface useless.

Instead, the interface just need a proper way of reporting failure to the caller, and should be designed to discourage misuse, i.e. it should not require you to do awkward things like set errno to 0 before calling then check errno after return. The functions as proposed clearly have no proper way to report failure, so I think they should be rejected in their current form.

posix_random_buffer could easily be fixed by changing the return type from void to int, and having it return -1 on failure and 0 on success. For the others, either the return value would need to be returned via a pointer to the storage (but this makes posix_random essentially a duplicate of posix_random_buffer with an implicit second argument of 4), or they could take a pointer to a location to store a failure flag (and not modify the pointed-to value on success). I like this latter design because it allows you to use them in complex expressions and loops without checking for failure on each call, but gives you an easy way to check for error at the end of the whole computation (and then discard the result) much like what can be done with floating point exception flags. However I also fear that naive users may just throw away the flag.
(0002321)
mdempsky (reporter)
2014-07-21 04:07

In practice, I don't expect these APIs to normally call abort(). I was just suggesting that in extreme situations, they should fail loudly/catastrophically rather than silently.

E.g., on most implementations, passing a null pointer to memcpy() will generally generate a SIGSEGV, which defaults to killing the process. Also, on systems that use memory overcommit, calling a function that needs dynamic binding might hit an out-of-memory condition trying to update the PLT entry and cause the process to die. Those are the sorts of conditions I had in mind for mentioning abort().

However, looking around more, I see functions like getpid() and getuid() do simply specify they always succeed, even if existing implementations provide mechanisms (e.g., ptrace, systrace, seccomp-bpf) that could cause them to fail.

I think there's benefit to keeping the API simple and failure-free, if possible. So my preference would be to specify "These functions are always successful" if the abort() wording is contentious (and in retrospect it seems superfluous anyway). My second preference would be to mark the interfaces as optional (i.e., under an option group) so they don't need to be provided at all by implementations that can't meet the "always successful" constraint.

Put differently: what sort of error conditions might we want to specify for these functions?
(0002322)
dalias (reporter)
2014-07-21 04:41

Passing a null pointer to memcpy is a completely different situation: by invoking undefined behavior, you give the implementation license to abort (or do whatever else it pleases). As for the situations cited with overcommit and OOM during lazy binding, I consider these at least extremely low-quality, and in my opinion non-conforming, implementations.

For the proposed posix_random interfaces, the obvious error condition is "insufficient entropy available to produce non-predictable output". This would normally have an underlying cause for the inability to obtain entropy (e.g. EMFILE or ENFILE attempting to open /dev/urandom or similar). I'm not sure whether it would be desirable to expose such underlying causes or just a generic error code indicating that sufficiently non-predictable output could not be produced.

Of course I would really just prefer that these functions not be permitted to fail, and that they not be optional. This does not give an implementation license to crash/abort when it can't satisfy the requirements; rather, it renders the implementation non-conforming if it can't satisfy the requirement.

However, I think it should be possible for all implementations to satisfy the requirements of an always-working posix_random (independent sequences in every process including after fork), albeit possibly not with ideal strength properties. For example, at fork time an implementation could consume enough output from posix_random in the parent to fill the internal state of the prng in the child before actually forking. This does not require obtaining any resource that could fail to be available. Another approach, if reseeding from /dev/urandom or similar is deemed important by the implementer, is for fork to fail when the resource (e.g. fd) needed to obtain entropy cannot be allocated.
(0002352)
eblake (manager)
2014-08-21 15:51
edited on: 2014-08-21 15:52

If we're going to add a new interface, it should be HARD to abuse it. Returning random bits leaves you nowhere to report errors other than the multi-statement dance of 'errno = 0; bits = call(); if (errno) report failure' - but too many progammers will omit error checking. I'd much rather require users to do: 'uint32_t bits; if (call(&bits) != 0) report failure', because that is harder to get wrong. Besides, gcc can even warn users on functions that are marked __attribute__((__warn_unused_result__)).

(0002385)
eblake (manager)
2014-09-12 18:37

An apt comment I received from Julian Coleman:
If I remember correctly, there has already been discussion in at least
some of the BSD camps specifically about the name "arc4random()". It
was considered a poor choice to have the algorithm in the function name(s).
So, that would be a vote for changing the name, at least.
(0002386)
eblake (manager)
2014-09-12 18:43

Upstream Linux kernel has just proposed adding a syscall getrandom() for getting random numbers without a file descriptor: http://lwn.net/Articles/606141/ [^] as a superset of the existing BSD syscall getentropy(); maybe standardizing something along this line of thought will be better.

    #include <linux/random.h>

    int getrandom(void *buf, size_t buflen, unsigned int flags);

A call will fill buf with up to buflen bytes of random data that can be used for cryptographic purposes, returning the number of bytes stored. As might be guessed, the flags parameter will alter the behavior of the call. In the case where flags == 0, getrandom() will block until the /dev/urandom pool has been initialized. If flags is set to GRND_NONBLOCK, then getrandom() will return -1 with an error number of EAGAIN if the pool is not initialized.
(0002387)
dalias (reporter)
2014-09-12 18:57

The new Linux getrandom, and the BSD getentropy, are the type of primitive you would want to use in implementing posix_random. So from my perspective, the question is a matter of which one is more useful to standardize: an underlying primitive to get secure entropy, or an interface that wraps this operation conveniently with a way to get multiple "random" numbers based on the entropy source, but without having to go back to the kernel each time.

My feeling is that, as proposed so far, posix_random does not offer sufficient guarantees on how it's implemented to be very appealing to programs that need secure random numbers, e.g. for key generation. If it's standardized, I fear most software will just ignore it and use system-specific mechanisms like getrandom or getentropy instead. Maybe there are improvements that could be made to assure users that posix_random can be used safely?
(0002438)
ajosey (manager)
2014-11-21 09:29

The Open Group Base Working Group has balloted in favor of sponsoring the interfaces for inclusion in Issue 8.

The submitter is requested to submit a revised proposal taking into account the
comments raised in the bug report so far.
(0002457)
tedu (reporter)
2014-12-01 20:36

Notes about the revised proposal here, then a more complete revised proposal to follow.

I think most comments are regarding the always successful requirement and the circumstances under which that may not be possible to achieve, generally due to lack of entropy early on.

This adds a function `int posix_random_init(void)`. If successful, it returns 0. On error, it returns an error code. Notable error values may be EAGAIN to indicate "low entropy" or EACCESS to indicate that /dev/random could not be opened, etc. Once init() returns 0, it should not be possible for any other errors to occur. Notably, it can't "degrade" from operational to not. Systems are still expected to make a best effort attempt to provide decent random numbers. For example, Linux may seed with getrandom(RND) and if that fails, seed with getrandom(URND) and then return EAGAIN (assuming I understand getrandom() semantics properly).

I've revised the text about the "sequences" generated to basically be the definition of a cryptographic RNG. Hopefully this addresses the concern that users may be uncertain about the safety of these interfaces.
(0002458)
dalias (reporter)
2014-12-01 20:41

What is the proposed behavior if posix_random_init is called more than once? Is it required to be safe in cases where multiple threads race to call it first? Etc. Since the purpose of this function, from a "black box" perspective, does not seem to be performing any "initialization" that's meaningful to the caller, but rather querying whether posix_random is working, I think it would make more sense to name it as such, and this would also get rid of any confusion about whether "multiple-init" is safe.
(0002459)
tedu (reporter)
2014-12-01 20:47

SYNOPSIS
     #include <stdlib.h>

     uint32_t posix_random(void);

     void posix_random_buffer(void *buf, size_t nbytes);

     uint32_t posix_random_uniform(uint32_t upper_bound);

DESCRIPTION

     This family of functions provides a random number generator suitable
     for use in cryptographic applications. Future values cannot be predicted
     by observation of previous values and previous values cannot be determined
     by observation of future values.

     The posix_random_init() function may optionally be called to initialize and
     seed the random number generator. If posix_random_init() returns successfully,
     subsequent calls to the other functions are guaranteed to meet the properties
     described above. If posix_random_init() indicates failure, the remaining
     functions will continue to function, albeit with possibly weakened security.

     The posix_random() function returns a single 32-bit value.

     posix_random_buffer() fills the region buf of length nbytes with
     random data.

     posix_random_uniform() will return a single 32-bit value, uniformly
     distributed in the range [0, upper_bound). This is recommended over
     constructions like "posix_random() % upper_bound" as it avoids "modulo
     bias" when the upper bound is not a power of two.

     All of these functions are thread safe.

RETURN VALUES
     The posix_random_init() function returns 0 for success and a non-zero error code
     to indicate failure.

     No error values are defined or reserved for the other functions.
(0002460)
eblake (manager)
2014-12-02 00:13

Note: 0002459 is missing the declaration of posix_random_init, and the description of specific error values it must use if it fails.

Should there be an interface for easily getting a uint64_t and/or double value? If an interface for producing a double is provided, would it be better as a double in the range [0.0,1.0) with a fixed exponent (modulo the boundary case of 0), so that all returned values are equally distributed, vs. a completely random double (where equal distribution of bit patterns produces more doubles close to 0 than it does doubles close to the maximum due to floating point scaling and subnormal values)? Naively constructing 64-bit numbers from a PRNG that only tracks 32 bits of state risks bias in those 64-bit numbers (as an extreme example, consider a 1-bit PRNG that outputs the repeating sequence 0, 1, 1, 0: constructing 2-bit numbers from that PRNG will only create two out of four possible values, which is extremely biased). Of course, if the underlying source is true randomness and not a PRNG, or if the PRNG has enough state, then concatenating 64-bit numbers from consecutive 32-bit calls is still going to be evenly distributed; but having a dedicated interface for this makes life easier (back to our design ideal of making it hard to get wrong via naive use). Likewise, a dedicated interface for getting a double, instead of requiring bit-twiddling or playing with ldexp and friends, would make this proposal much more appealing.
(0002461)
eblake (manager)
2014-12-02 00:22

Should the proposal explicitly mention that this family of interfaces cannot be re-seeded to replay an earlier random stream? Even if only in an application usage, mentioning the differences between the other random interfaces (where a specific seed can give deterministic behavior, which can be nice for debugging other issues) and this one (where it looks to be intentional that we are not exposing an interface for setting a seed, although the wording still appears to intentionally allow an internally-seeded PRNG with sufficient state to be used as an implementation, and use of actual random hardware is merely a quality-of-implementation detail).
(0002462)
mdempsky (reporter)
2014-12-02 00:35

[re 0002460] I think the RNG is failing at being "suitable for cryptographic applications" if it outputs 32-bit values that can't safely be concatenated to form an unbiased 64-bit value.

Additional interfaces are fine by as long as we believe they'll be useful in practice, but I'm a bit worried about expanding the scope too much. It's also perhaps worth noting that C++11 separates the idea of "uniform random number generator" (i.e., an object that returns random integers) from the idea of "random number distributions" (i.e., objects that take random integers and massage them into various useful distributions such as uniform, normal, bernoulli, etc.). So I'd be inclined to limit ourselves to defining what's necessary for the system to provide a cryptographically secure RNG, and for now defer to applications to decide how to adapt that to their needs.

At least within OpenBSD's base system + third-party software ports, the proposed functions (fill a buffer, random 32-bit value, and random 32-bit value with an upper bound) have served the most common needs. I think simple random 64-bit values have come up a few times (usually satisfied by concatenating two 32-bit values). I don't think I've ever seen a need for a random floating point value though.
(0002463)
tedu (reporter)
2014-12-02 02:28

We have very rarely found a need for 64 bit random numbers. Some notes on practical applications:

Used to generate 16 or 32 bit IDs (pids, DNS, RPC, etc.) 64 bit IDs are less frequently encountered in most protocols.

Used to select a random index from an array. This is what the uniform() function is for. 64 bit array indices are possible, but not practical for most object sizes. (Why would one need to select a random index into an array containing more than 4 billion char, for instance?)

Used to generate keys and nonces, etc. These are generally larger than 64 bits, and the buffer() function is used.

The only time I have needed 64 bit numbers I actually needed a size_t/intptr_t and used the buffer() interface.

64 bit variants don't seem necessary, but if "completeness" has strong appeal, I do not think adding them would be harmful or pose any great burden for implementations.

If a double interface is added, I think it should be along the lines of uniform(), and return a value in the range [0, bound). Or [lower, upper). The implementation should do the necessary scaling. As noted, the floating point range is itself not uniform. I have some reservations that users will interpret a random double to represent 64 random bits, but an implementation that actually did that would be strongly biased.
(0002464)
dalias (reporter)
2014-12-02 03:04

Concatenating N-bit pseudo-random values to produce a 2N-bit pseudo-random value is not such a simple matter. In order for the latter to be uniformly distributed, uniform distribution of the former does not suffice; rather, every possible pair of two N-bit numbers must occur in sequence with equal probability. Obviously short-period PRNGs fail to satisfy this property by a simple counting argument, but I don't think it's obvious that well-known CSPRNGs satisfy it, if they even do.

Assuming sufficiently large period and other properties necessary for a CSPRNG, it seems plausible to me that even if the 2N-bit concatenation is not uniform, it may be indistinguishable from uniform for most practical purposes. However, upon repeating the process to get larger and larger random numbers you'll eventually reach a point where the output is not remotely uniform.

I'm not an expert in this area, but I feel like if these interfaces are to be usable for cryptographic purposes (e.g. producing keys), some further guarantees with regard to periodicity, quality of distribution of posix_random_buffer output for large values of nbytes, etc. are needed.
(0002465)
tedu (reporter)
2014-12-02 03:22

Regarding the limits of uniformity, using something like AES-CTR is for example not uniform after concatenation because two blocks will never repeat. The state space of a true stream cipher like ChaCha20 is a little more forgiving here. In practice, the fact that AES-CTR never repeats does not appear to have diminished its acceptability.

The wording "Future values cannot be predicted by observation of previous values" does not directly address periodicity, but I think it should be inferred to mean that even after "many" observations, prediction remains impossible. i.e., the period is "long".

Would adding "indistinguishable from uniform noise" be sufficient to guarantee distribution?
(0002467)
nsz (reporter)
2014-12-02 13:04

i think the description of posix_random_uniform should say something
about 0 upper_bound

i'd explicitly mark this case as undefined behaviour (an implementation
would typically do %upper_bound operation in a loop which is ub for 0)

(the other sensible definition i can think of is to interpret 0 as
upper_bound == 2^32, but that's probably too clever)
(0002473)
tedu (reporter)
2014-12-04 00:10

A few notes about the choice of name.

This interface exists on several platforms already as arc4random() and third party software also exists which uses it (sometimes providing their own copy for platforms that don't have it). The existing platforms already basically conform to the proposed addition here, though some lack the uniform() function added later.

Standardizing on the existing name would make for an easier transition for much of this software, instead of requiring all parties to rewrite everything.

Also, people searching the internet for "posix_random" are unfortunately likely to find lots of information about the POSIX "random" function. It would be bad for them to then start using that interface.

The letters "rc4" in the name need not imply or guarantee any particular implementation.
(0002488)
nmav (reporter)
2014-12-10 08:45

One comment for posix_random_init() is that it is difficult to use. That is, how is a library expected to use that interface? A lot of code contains something like "srand(time(0));x=rand();" every time a random number is produced, most probably the story will be repeated for that interface. I'd suggest to have an implicit initialization to simplify usage.

I also think that the description should make clear that this interface will produce unpredictable numbers to both processes generated after a fork. The initial description had it explicit, but the second it has it subtle - I believe it should be explicit.
(0002489)
eblake (manager)
2014-12-10 15:28

Are you aware that OpenBSD is considering breaking the contract of rand() and friends?

> List: openbsd-cvs
> Subject: CVS: cvs.openbsd.org: src
> From: Theo de Raadt <deraadt () cvs ! openbsd ! org>
> Date: 2014-12-08 21:45:20
> Message-ID: 17119865312585819206.enqueue () cvs ! openbsd ! org
>
> CVSROOT: /cvs
> Module name: src
> Changes by: deraadt@cvs.openbsd.org 2014/12/08 14:45:20
>
> Modified files:
> include : stdlib.h
> lib/libc/stdlib: Makefile.inc drand48.c lcong48.c lrand48.c
> mrand48.c rand.3 rand.c rand48.3 rand48.h
> random.3 random.c seed48.c srand48.c
>
> Log message:
> Change rand(), random(), drand48(), lrand48(), mrand48(), and srand48()
> to returning strong random by default, source from arc4random(3).
> Parameters to the seeding functions are ignored, and the subsystems remain
> in strong random mode. If you wish the standardized deterministic mode,
> call srand_deterministic(), srandom_determistic(), srand48_deterministic(),
> seed48_deterministic() or lcong48_deterministic() instead.
> The re-entrant functions rand_r(), erand48(), nrand48(), jrand48() are
> unaffected by this change and remain in deterministic mode (for now).
>
> Verified as a good roadmap forward by auditing 8800 pieces of software.
> Roughly 60 pieces of software will need adaptation to request the
> deterministic mode.
>
> Violates POSIX and C89, which violate best practice in this century.
> ok guenther tedu millert
(0002490)
safinaskar (reporter)
2014-12-10 18:23

Interfaces for getting high quality random numbers have a number of issues in existing systems. So, it would be very nice if we fix all them is this proposal. We should create some really high quality API. Well, issues with existing implementations:

1. On Linux standard way of getting high quality random numbers is /dev/random and /dev/urandom. But this method requires opening files, and so it doesn't work when there is no free file descriptors. This caused LibreSSL security hole, and then this caused getrandom system call proposal to Linux (then this syscall was actually added). Links:
http://threatpost.com/overblown-libressl-prng-vulnerability-patched/107245 [^]
http://port70.net/~nsz/47_arc4random.html [^]
http://lists.openwall.net/linux-kernel/2014/07/17/235 [^] (also, consider getrandom API in this message)

So, I think we should write in POSIX that our posix_random should work even if there is no free file descriptors. So, we explicitly will prohibit implementation of posix_random as a wrapper on /dev/[u]random. (And if you don't like this my proposal, then explicitly write the opposite, i. e. "posix_random may not work if there is no available free descriptors")

2. On Linux /dev/random always generates truly random numbers (i. e. numbers got from some hardware, such as touchpads or some special hardware random numbers generators). If there is no entropy available, then /dev/random blocks.

/dev/urandom generates truly random numbers, too. But if there is no entropy, it returns pseudo-random numbers.

"man 4 random" on Linux recommends using /dev/random for "long-lived GPG/SSL/SSH keys". But well-known cryptography researcher D. J. Bernstein says here ( http://lists.randombit.net/pipermail/cryptography/2013-August/004983.html [^] ) an objection to that man page and says that /dev/urandom should be used nearly everywhere (if it was properly initialized). But in the same mail Bernstein says another thing: there is very important for random source to be properly initialized (using some truly random source or using random seed which is preserved across reboots). And /dev/urandom on Linux still works even if not initialized and this is bad.

So, please specify in POSIX, what kind of numbers posix_random generates by default. Always truly random ones? Or it can fail-back to pseudo-random ones? And does it fail/block if it was never initialized?

I think (using Bernstein's arguments) that posix_random should not block by default if initialized. I. e. if it is initialized, then it should always return numbers immediately even if this is pseudo-random numbers. But it may provide some option to alter this behavior.

Then, I don't what whatever it should block/fail if it was not initialized. Probably, yes. At least posix_random should provide option to force error if the generator was not initialized.

P. S. I didn't read all this discussion, sorry. I may repeat somebody
(0002491)
nmav (reporter)
2014-12-10 19:28

@safinaskar. I'd suggest not to burden this function with entropy estimators. The best is to mark the functions explicitly as "cryptographic pseudo-random number generator". Then it is clear what it can be used for.
(0002492)
tedu (reporter)
2014-12-10 20:16

Having had a week to ponder it, I'm less happy with the idea of an init() function. I proposed it as a compromise to returning error codes, but I think it's the worst kind of compromise: worse than the sum of its parts. Using init() is awkward and really only adds an opportunity for application code to do something wrong.

I think this interface is much better without it, as originally proposed. OpenBSD has done a pretty good job in my opinion of building an interface that can't fail. Libressl portable and libottery-lite are also converging on portable solutions that come very close, even without the benefit of complete system integration.
(0002493)
safinaskar (reporter)
2014-12-10 22:33

nmav, okey, but we still need way such that application can determine that generator was initialized.

We need it. Because there is virtual machines with R/O boot, which has no entropy sources, and which don't preserve random seed across reboots, so they are in predictable state which is not satisfactory for cryptography
(0002494)
philip-guenther (reporter)
2014-12-10 22:58

To reply to eblake: http://austingroupbugs.net/view.php?id=859#c2489 [^]
> Are you aware that OpenBSD is considering breaking the contract of rand() and friends?

Yes. I'm the 'guenther' mentioned in Theo's commit message. :-)

To provide context, I'll quote from Theo's public note about why
the project did this. The full note can be seen at
        http://marc.info/?l=openbsd-tech&m=141807224826859&w=2 [^]

> [rand/random/*rand48 are] used in two patterns:
> 1. Under the assumption it provides good random numbers.
> This is the primary usage case by most developers.
> This is their expectation.
> 2. A 'seed' can be re-provided at a later time, allowing
> replay of a previous "random sequence", oh wait, I mean
> a deterministic sequence...
...
> I started my audit of the ports tree for rand() and random() use, as
> indicated by the linker. I found 1221 ports out of 8800 used them.
>
> Differentiating pattern 1 from pattern 2 involved looking at the
> seed being given to the subsystem. If the software tried to supply
> a "good seed", and had no framework for re-submitting a seed for
> reuse, then it was clear it wanted good random numbers. Those ports
> could be eliminated from consideration, since they indicated they
> wanted good random numbers.
>
> This left only 41 ports for consideration. Generally, these are doing
> reseeding for reproduceable effects during benchmarking. Further
> analysis may show some of these ports do not need determinism, but if
> there is any doubt they can be mindlessly modified as described below.
...


So we have a software ecosystem where programs that don't need
determinism are getting it because the portable APIs are deterministic.
For many programs, this may be of little consequence, but I doubt
anyone would take the bet that *none* of the programs using the
rand/random/rand48 APIs has a security or correctness bug from
this.

In some cases, the OpenBSD ports team have locally patched programs
to use our arc4random APIs, but pushing those changes upstream when
the larger community, particularly Linux, does not have those APIs
raises the dev and maintenance costs to those 3rd party developers.

Solving this for the larger ecosystem is one of the goals of the
posix_random API proposal, by providing a portable, non-deterministic
random API and letting programs express their intent. We believe
we've demonstrated that this should be something that the larger
community should be able to implement and spread widely.


The change in OpenBSD to rand/random/*rand48 is an attempt to protect
our community immediately in a way that we believe we can afford and
accept. Perhaps we'll find that this change actually does cause too
many problems to be sustained even in OpenBSD. We certainly are not
proposing that the standards for those functions be changed at this time.

(OpenBSD does has standards compliance as a goal, but security has
higher priority for us; if a standards requirement has turned out
to reduce security in practice, we'll look for pragmatic ways to
reduce the security cost. Note that even in this change we continue
to provide a way for programs that require deterministic behavior
to get it. The project's stance should surprise no one; other
projects refuse to comply with POSIX requirements because they
similarly are unwilling to accept with the costs of complying in performance
or other metrics.)
(0002497)
nmav (reporter)
2014-12-12 08:53

@safinaskar, this is for a POSIX API to access a CPRNG. It should simply assume that the OS will somehow initialize it. How or whether the OS will provide sufficient initialization is outside the scope of the API. If we want to cope with OSes that may not provide sufficient initialization under certain circumstances, then the best would be to modify the API to return error codes, instead of adding an initialization function. An initialization function complicates the API usage by libraries, and that API should be as simple as possible and target universal adoption.
(0002499)
steffen (reporter)
2014-12-16 13:20

Since this is a new posix_ interface and the future is even more parallism rather than less -- shouldn't it be considered to make this an object with a clear init (new, create) / use / destroy cycle rather than another intransparent single-instance function interface?
Maybe with a one-shot function call interface that uses a builtin object, for convenience sake.

Btw., after being hit and disclassified as a "lesser-secure application" by a very large free email+++++ provider (requiring users and myself to enter a blinking stylish web interface and agree in being "lesser secure") even though using TLSv1.2 transport i have to say that i'm currently a bit fretted.
There are programs which use the old obsolete and insecure random interfaces for purposes for which they are absolutely sufficient.

         /* In that case it's only used for boundaries and Message-Id:s so that
          * srand(3) should suffice */

Thank you.
(0002581)
steffen (reporter)
2015-03-13 14:59

I've looked at the current extended (arc4)random implementations of FreeBSD and NetBSD.
Whereas the former uses a single object protected by a global lock, the latter uses a multiple-level approach with thread-specific data and minherit(2) to ensure that the TSD PRNG state is zeroed on fork(2).

Repeating my argument that the future is more parallelism rather than less i think POSIX should ensure that users of the new POSIX interface can adapt that
to their needs, because:

. The former approach may be so undesirable that implementation of a project-internal PRNG is necessary.
. The latter, on the other hand, may cause resource wastage in a classical manager / worker thread model (and how easy it was if any other approach would result in undefined behaviour) with pools of many workers. I don't think this is a weak argument even in times where a lot of mobile phones have more power and resources than the notebook i'm using to write this message, because there are also many resource-preserving machines available, with more massive-parallel superlow-resource-consumption in sight. Finally, wasting resources for plain nothing is inherently bad design.

So i'll propose a slightly extended interface instead that uses objects. Changing the mentioned implementations (and many systems have identical ones for much longer than a decade) to adapt this is practically zero effort, it's just that they have to expose the internal interface.

Using objects users are free to decide in which threads they want random number generators, they can reuse objects at will (e.g., if a thread is taken from a pool to handle action A that requires random numbers to be generated it can be given access to PRNG B); the single global builtin object (accessible via NULL) can be used by all those which don't need anything special.

I'll post this next. Shall this object-based approach not find any friends please have a look at the post thereafter, which is a slightly redefined version of the current state of this issue.
(0002582)
steffen (reporter)
2015-03-13 15:00
edited on: 2015-03-13 15:02

SYNOPSIS
     #include <stdlib.h>

     int prand_create(prand_t *self);
     void prand_delete(prand_t *self);

     uint32_t
     prand_random(prand_t *self_or_null);

     uint32_t
     prand_random_uniform(prand_t *self_or_null, uint32_t bound);

     void
     prand_random_buf(prand_t *self_or_null, void *buf, size_t len);

     int
     prand_stir(prand_t *self_or_null);

     int
     prand_addrandom(prand_t *self_or_null, void const *buf, size_t len);

DESCRIPTION
     The prand family of functions provides a cryptographic pseudorandom
     number generator seeded from the system entropy pool.
     prand is designed to prevent an adversary from guessing outputs,
     unlike rand(3) and random(3).

     prand_create() and prand_delete() can be used to generate PRNG objects
     with completely isolated internal states. prand_delete() shall ensure
     that the memory used for the PRNG is zeroed. In order to use such
     objects from within multiple threads, proper synchronization methods
     must be used.

     The prand_random(), prand_random_uniform(), prand_random_buf(),
     prand_stir() and prand_addrandom() functions can either be used in
     conjunction with an isolated PRNG object that has been created by
     prand_create(), or be given a NULL prand_t argument, in which case
     a library global internal, multithread-safe PRNG object is used instead.

     prand_random() returns an integer in [0, 2^32) chosen independently with
     uniform distribution.

     prand_random_uniform() returns an integer in [0, bound) chosen indepen-
     dently with uniform distribution.

     prand_random_buf() stores len bytes into the memory pointed to by buf, each
     byte chosen independently from [0, 256) with uniform distribution.


     prand_stir() draws entropy from the operating system and incorporates
     it into the given objects PRNG state to influence future outputs.

     prand_addrandom() incorporates len bytes from the buffer buf into the
     given objects PRNG state to influence future outputs.

     It is not necessary for an application to call prand_stir() or
     prand_addrandom() before calling other prand functions with the library
     internal global PRNG. The first such call to any prand function will
     initialize the PRNG state unpredictably from the system entropy pool.
     Shall the system entropy pool fail to provide enough entropy, an
     undefined pseudorandom algorithm shall be used to initialize the PRNG
     state in order to serve the request. These initialization steps shall
     be performed on each access of the PRNG unless the system can satisfy
     the request and provide entropy. In order to ensure that the PRNG is
     seeded with entropy, prand_stir() or prand_addrandom() can be used
     beforehand random numbers are requested from the library internal global
     PRNG.

RETURN VALUE
     Upon successful completion, the prand_create(), prand_stir() and
     prand_addrandom() functions shall return zero.
     Otherwise an error value shall be returned to indicate the error.

ERRORS
     The prand_create(), prand_stir() and prand_addrandom() functions shall
     fail if:

     [EAGAIN] The system is currently incapable of providing entropy.

     In the case of prand_create() no resources shall be bound to the self
     object on error. Calling prand_delete() on such an object is undefined.


RATIONALE
     The prand functions provide the following security properties against
     three different classes of attackers, assuming enough entropy is
     provided by the operating system:

         · An attacker who has seen some outputs of any of the prand_random
             functions cannot predict past or future unseen outputs.

         · An attacker who has seen the library's PRNG state in memory can-
             not predict past outputs.

         · An attacker who has seen one process's PRNG state cannot predict
             past or future outputs in other processes, particularly its par-
             ent or siblings.

     One `output' means the result of any single request to an prand_random
     function, no matter how short it is.

     Implementations shall ensure that the entropy state of the library internal
     global PRNG is zeroed on fork.

(0002583)
steffen (reporter)
2015-03-13 15:01

SYNOPSIS
     #include <stdlib.h>

     uint32_t
     posix_random(void);

     uint32_t
     posix_random_uniform(uint32_t bound);

     void
     posix_random_buf(void *buf, size_t len);

     int
     posix_random_stir(void);

     int
     posix_random_addrandom(void const *buf, size_t len);


DESCRIPTION
     The posix_random family of functions provides a cryptographic pseudorandom
     number generator automatically seeded from the system entropy pool and
     safe to use from multiple threads.
     posix_random is designed to prevent an adversary from guessing outputs,
     unlike rand(3) and random(3).

     posix_random() returns an integer in [0, 2^32) chosen independently with
     uniform distribution.

     posix_random_uniform() returns an integer in [0, bound) chosen indepen-
     dently with uniform distribution.

     posix_random_buf() stores len bytes into the memory pointed to by buf, each
     byte chosen independently from [0, 256) with uniform distribution.

     posix_random_stir() draws entropy from the operating system and incorpo-
     rates it into the library's PRNG state to influence future outputs.
     
     posix_random_addrandom() incorporates len bytes from the buffer buf into
     the library's PRNG state to influence future outputs.

     It is not necessary for an application to call posix_random_stir() or
     posix_random_addrandom() before calling other posix_random functions. The
     first call to any posix_random function will initialize the PRNG state
     unpredictably from the system entropy pool. Shall the system entropy pool
     fail to provide enough entropy, an undefined pseudorandom algorithm shall
     be used to initialize the PRNG state in order to serve the request. These
     initialization steps shall be performed on each access of the PRNG unless
     the system can satisfy the request and provide entropy.
     In order to ensure that the PRNG is seeded with entropy,
     posix_random_stir() or posix_random_addrandom() can be used beforehand
     random numbers are requested from the PRNG.


RETURN VALUE
     Upon successful completion, the posix_random_stir() and
     posix_random_addrandom() functions shall return zero.
     Otherwise an error value shall be returned to indicate the error.

ERRORS
     The posix_random_stir() and posix_random_addrandom() functions shall fail
     if:

     [EAGAIN] The system is currently incapable of providing entropy.

RATIONALE
     The posix_random functions provide the following security properties
     against three different classes of attackers, assuming enough entropy is
     provided by the operating system:

         · An attacker who has seen some outputs of any of the posix_random
             functions cannot predict past or future unseen outputs.

         · An attacker who has seen the library's PRNG state in memory can-
             not predict past outputs.

         · An attacker who has seen one process's PRNG state cannot predict
             past or future outputs in other processes, particularly its par-
             ent or siblings.

     One `output' means the result of any single request to an posix_random
     function, no matter how short it is.

     Implementations shall ensure that entropy states are zeroed on fork.
(0002584)
steffen (reporter)
2015-03-13 15:03

P.S.: fixed some newline mangling in the object-based one.

The posted texts are edited versions of the NetBSD manual [1].

 [1] http://netbsd.gw.com/cgi-bin/man-cgi?arc4random++NetBSD-current [^]
(0002585)
tedu (reporter)
2015-03-14 01:00

Strong objection to complicating the interface.

Threading performance should, at best, be considered a quality of implementation issue.

The addition of a stir interface is not necessary. If it's optional, then it serves no purpose and should not be included.

     Shall the system entropy pool
     fail to provide enough entropy, an undefined pseudorandom algorithm shall
     be used to initialize the PRNG state in order to serve the request.

This is extremely dangerous, and renders the entire proposal useless. The standard should require that these functions work, not permit them to fail.
(0002586)
nmav (reporter)
2015-03-14 10:59

I'm also opposed to a stir interface. I don't see how that can be used at all. The basic question is when should this function be called? If it is after X amount of data depending on the internal PRNG, why isn't it be called internally by posix_random_buf? The same for prand_addrandom... The purpose of the interface is to be simple, if the user is responsible of seeding it... we duplicate the current mess in rng interfaces.

Other than that, I think we should agree on what is the purpose of the interface. Is it to provide a process with high quality randomness that can be used as a PRNG? Then the plain posix_* interface in #0002459 is sufficient.
(0002587)
nmav (reporter)
2015-03-14 11:03

Said that, I'd prefer an interface which provides error codes, such as:

     int posix_random(uint32_t *val);
     int posix_random_uniform(uint32_t upper_bound, uint32_t *val);
     int posix_random_buffer(void *buf, size_t nbytes);

Which will return 0 on success and -1 on failure with errno set appropriately. That way this interface will be able to propagate errors from getrandom() in Linux and getentropy() in BSDs. Without error codes what should that interface do in case getentropy fails?
(0002588)
steffen (reporter)
2015-03-14 12:18

> Shall the system entropy pool
> fail to provide enough entropy, an undefined pseudorandom algorithm shall
> be used to initialize the PRNG state in order to serve the request.
>
>This is extremely dangerous, and renders the entire proposal useless. The >standard should require that these functions work, not permit them to fail.

For one that surely depends on what the implementation chooses to do.
Then the possibility that this situation happens seems so low that some implementations call abort() when the system cannot serve enough entropy for seeding (including OpenBSD i think) -- calling abort() cannot -- imho -- be the right answer for a standardized interface, imagine what the Wall Street would say when their system aborts because some PRNG cannot become generated.

And then, most important to me, users of the interface can detect this situation and actively prevent usage of randoms created like that.
Note that in the object-based approach the constructor would simply not return an object at all in this case, so that users would have to retry in order to get a usable object, i.e., cannot produce any pseudo random numbers due to lack of the PRNG object.

>Said that, I'd prefer an interface which provides error codes, such as:

This i wanted to avoid by moving the possible error case to some constructor (alike) function so that there is a single point of (possible) failure only.
The latter can also be achieved by a posix_random_init() function.

>I'm also opposed to a stir interface. I don't see how that can be used at all. The basic question is when should this function be called?

It should be up to the implementation to internally decide if and when such things happen.
But giving users the opportunity to "reset" seems like a good thing to me; recalling my worker thread pool example it would seem natural to _stir() the PRNG state when a single job is done, so that the next usage of the PRNG in the next job is stirred, whatever this means, and if it means anything at all;
like this implementations could be enabled to use faster update paths, at least in the one or other case, or simply do nothing.

It should also be pointed out that the existing (to the extend i have looked) interfaces have these interfaces and that therefore applications may use them.
And in the end this is an interface for programmers which design libraries and applications -- software is used in medicine, in atomic plants, airplanes and spaceships, and pretty often it works out quite well.

>Threading performance should, at best, be considered a quality of implementation issue.

I haven't stated anything about threading performance, though of course user-controllable and/or detached PRNGs will impose lesser locking or even don't require synchronization at all, while at the same time allowing users to control resource usage.
And of course i'd assume that lesser random bytes per PRNG mean lesser exposed internal state, though this is an extremely unacademical statement.
(0002647)
nmav (reporter)
2015-05-05 08:42

>> Said that, I'd prefer an interface which provides error codes, such as:
> This i wanted to avoid by moving the possible error case to some constructor
> (alike) function so that there is a single point of (possible) failure only.
> The latter can also be achieved by a posix_random_init() function.

It is not always possible to move all checks to a constructor. Most PRNGs and DRBGs which will be used to implement the interface require periodic reseeding and during that an error may occur. So for me, an ideal interface would be the one in #0002587.
(0002651)
steffen (reporter)
2015-05-05 11:22

A pseudo-filesystem with open(2)/read(2)/close(2) requires special kernel support, which seems suboptimal. Using user-space objects comes near.
(0002683)
mirabilos (reporter)
2015-05-27 22:25

If there is an implementation which allows for failure, applications will continue to ignore it; there already exists arc4random, which is guaranteed to never fail *if* it exists, and if posix_getrandom (or however it's named) lacks this guarantee, applications WILL continue to ship their own copy of arc4random and use it.

Make it optional but do *not* permit failure.

As for blocking: duh. No. There are ways. Bake in some entropy into the kernel image (yes this breaks reproducible builds, no, I do not care), and additionally pass some via the bootloader (with OpenBSD boot(8) and GNU GRUB getting an extra bonus over things like LILO here because they can read files that can actually be updated once the system has booted).

In MirBSD, we actually bake the complete initialised arc4random structure into the kernel (I've got a pure-mksh implementation of it, to generate it), so calling arc4random(9) as well as arc4random(3) is safe at any time (though extremely early calls will get the same result, which is why we try to get extra entropy into the kernel as early as possible, will adopt OpenBSD's passing-from-the-bootloader idea (I had a similar one but could not get it working, I believe I can get working by myself the way OBSD does it now), and have config(8) update it in the binary kernel image on each call).

So there's absolutely no excuse for a random number request to fail, period.

(As for "embedded router flash" blabla: surely each flashed image need not be the same; people could change the program that actually *does* the flashing to replace a couple of bytes, after verifying the integrity of the image, but before burning it into EEPROM. If that's truly machine-specific and the device has an RTC, just add the time as early as possible on each boot as well.)
(0002684)
nmav (reporter)
2015-05-28 07:09

There is no _if_ there. Implementations which rely on the existing kernel interfaces can fail and they have to indicate it somehow.

Applications can always wrap around an interface that fails with a void function that calls abort if they fail. But there are no posix functions which call abort automatically when they fail.
(0002686)
mirabilos (reporter)
2015-05-29 22:18

@nmav: “Implementations which rely on the existing kernel interfaces” is two-faced.

We need not consider any existing implementations because we're designing a new interface.

New implementations for this merely must simply ensure they do not fail. If they have to rely on kernel interfaces that, unlike OpenBSD's, can fail, then they have to either do the necessary work in a constructor, call abort() plus kill(getpid(), 9) no matter how discouraged that otherwise be, or just not offer this interface.

That being said: why do you insist on talking about a kernel? I smell Linuxism here.
(0002687)
steffen (reporter)
2015-05-30 12:34

> We need not consider any existing implementations because we're designing a new interface.

The arc4random is older than the century, widely spread and in use.

> New implementations for this merely must simply ensure they do not fail. If they have to rely on kernel interfaces that, unlike OpenBSD's, can fail, then they have to either do the necessary work in a constructor

But since it is new in POSIX it seems wise to adjust this interface a little bit and allow for a testable point of failure.
(E.g., the OS i'm writing this on documents for _stir() that it "reads data from /dev/urandom", but still the function returns void.)
Most applications will not care about errors (they have been educated to do so for a long time), but for the others who want to ensure (re)seeding was indeed successful and "strong", new code could be written as, e.g.,

  while (posix_random_stir())
    sched_yield();

And i reiterate, also in respect to this example, that the interface is really asking to give users the option to define the lifetime of objects just as they desire, like pasted in id=2582. For implementors which yet provide arc4random it would most likely mean nothing but exposing the interface they use internally anyway. It is cheap, it scales just as desired, and it is easy to use.
(0002725)
nmav (reporter)
2015-06-19 08:16

mirabilos, a new interface which will be implemented in existing systems, not in a vacuum. In all existing systems the process of obtaining quality random numbers may fail. Asking applications to be killed because the system for some reason cannot provide random numbers on a specific point in time is absurd.

The points of possible failure of this function is at initialization, or during reseeding. At these points a high quality random generator may be queried to seed to PRNG. Even if we can hide the error at initialization by not allowing a process to start if the kernel doesn't provide random numbers, failure on reseeding (which is implicit on the proposed function) cannot be hidden. Thus the function would have to indicate failure.
(0002789)
nmav (reporter)
2015-08-14 08:26

After deliberation I align with the simplistic approach. It is much better for the API not to require any checks, so I prefer the API described in #0002459. It is possible for OSes to be modified to make random number gathering a process that cannot fail, and thus it makes sense to follow the simplest approach.
(0002830)
EdSchouten (updater)
2015-09-17 15:05

Quick question: is there an actual need for the posix_random() function itself? Isn't it as trivial to just call into posix_random_buffer() and pass in the location of the uint32_t where you want to store the value? We could consider axing posix_random() and renaming posix_random_buffer() to the former. That would look a bit cleaner.

Furthermore, maybe it would make more sense to let the _uniform() function operate on an uintmax_t, instead of limiting it to 32-bit arithmetic.

Proposed API:

void posix_random(void *buf, size_t nbytes);
uintmax_t posix_random_uniform(uintmax_t upper_bound);
(0002831)
nmav (reporter)
2015-09-17 15:36

For the use cases I'm considering this API for, I have no use for the original posix_random(), so the proposed API of EdSchouten, seems simpler to me.
(0002832)
EdSchouten (updater)
2015-09-17 21:42

An additional remark: posix_random_uniform(x) returns a random number in [0, x). I can imagine where this comes from: it is a drop-in replacement for doing 'val % x'. But what bothers me, is that it does introduce some corner cases.

What if posix_random_uniform() is invoked with an upper bound of zero? I suspect that the existing implementations crash, as they do a modulo/division by zero.

On the other end of the spectrum, there is no way you can make it return UINT32_MAX (or UINTMAX_MAX per my proposal). I agree that in that case you shouldn't be using this function in the first place, but that would only hold if the upper bound is a constant -- not a run-time variable.

In my opinion the function should be changed to return [0, x] instead of [0, x).
(0002833)
eblake (manager)
2015-09-17 22:06

posix_random_uniform(3) is NOT the same as posix_random()%3. As a thought experiment, consider a random interface that returns 2 bits of information (0, 1, 2, 3) with equal probability. If you take that result % 3 to get a random number in the range [0,3) (that is, 0, 1, or 2), you will be heavily biased: 0 will occur 50% of the time, while 1 and 2 occur only 25% of the time. Of course, the bias is less observable when taking [0,UINT32_MAX] % 3, but it is still there. A uniform range is designed for cases of non-power-of-two upper bounds, with an emphasis of NOT providing an unfair bias to the lower numbers of the range.

Pragmatically, you can achieve uniform results by discarding any result beyond the tail end of the range (or exact multiple of the range) - but at an expense of an unknown number of tries (no guarantee how many discards you will need to process before getting an in-range result). Also, if you are looking for a uniform result with fewer bits than the mantissa of a floating point number [such as 53 for IEEE double], then you can create a double between [1.0,2.0) with uniform exponent and random mantissa, and then scale that to your integer range for an unbiased result (and it is this half-open range between 1.0 and 2.0 that conveniently matches the specification of posix_random_uniform() being a half-open range). But when it gets larger than a mantissa, you have to be careful that composing a larger number by concatenating two smaller random numbers does not itself introduce bias.

There is no need for posix_random_uniform(UINT32_MAX) - just call posix_random() instead (since that is already uniform over uint32_t). [or uintmax_t, if we go that way]. But if you are worried about the interface, then simply specify posix_random_uniform(0) be a synonym for posix_random(), covering the range [0,UINT32_MAX], by merely documenting that every input x to posix_random_uniform() results in the range [0,x-1], and 0-1 is UINT32_MAX.
(0002834)
EdSchouten (updater)
2015-09-18 06:54

Yes, I am well aware that on the implementation side posix_random_uniform() is more complex than just computing the remainder, as it also eliminates modulo bias. I only wanted to convey that it can be used in cases where a programmer had to write 'arc4random() % n', but wanted to achieve that without any bias.

The trick you mentioned with floating point numbers is interesting. I seem to remember certain C libraries use this to implement drand48() and erand48(). They first construct a floating point number between [1.0, 2.0) and subtract 1.0 to place the result in the right domain.

My remark is completely unrelated to that matter. What bothers me is that the upper_bound may not always be a fixed value. It can be a runtime variable -- maybe even a value coming from unchecked sources. Letting the function crash the process in case it happens to be zero is a bad thing in my opinion.

Using [0, x] instead of [0, x-1] still feels a bit more natural and cleaner in my opinion, but making posix_random_uniform(0) return values across the entire range is a step in the right direction. I'd love to see that happen. Thanks!
(0002836)
steffen (reporter)
2015-09-18 10:14

I seem to remember certain C libraries use this

RANDOM.TEX v.1 from Donald Arseneau supports min/max clipping, too. It explicitly uses 32-bit signed, so you better check max not integer wrapping when using the same algorithm in C.
(0002837)
tedu (reporter)
2015-09-18 15:16

Should we petition the C standard to change x % 0 to evaluate to x too? What about div()? etc, etc.
(0002838)
EdSchouten (updater)
2015-09-18 16:10

We should petition to come up with a set of interfaces that are functional, easy to use and are not prone to common mistakes.

What makes you assume that integer division/modulo is any way related to generating a uniform random number with a range? Dividing an integer by zero makes no sense, of course, but why should we forbid computing a random number that spans the entire range of the integer type?

What I was proposing previously (letting the function generate a number between [0, x]) is actually available in Python under the name random.randint() and in C++ under the name std::uniform_int_distribution.
(0002840)
shware_systems (reporter)
2015-09-21 17:23

Re: 2837

If anything, from the theoretical standpoint, the evaluation of X % 0 should be 0, not X, as X / 0 is defined as exactly infinity with 0 left over, and matching sign (or swapped sign if negative 0 is supported).

As few, if any, virtual or hardware architectures provide support for distinct integer infinity representations, I'm more inclined that leaving the behavior as unspecified is appropriate, or a change to implementation-defined at most so the behavior used by each platform gets documented. I believe most hosted implementations are expected to execute raise(SIGFPE) for this circumstance, or a signal defined as representing integer overflow specifically.
(0002841)
dalias (reporter)
2015-09-21 17:33

There is no expectation for a signal on division by zero for hosted implementations. Many hardware architectures do not do this natively, and emulating it is an unjustified cost. C simply leaves the behavior of division by zero undefined; it does not even generate an unspecified or implementation-defined value or signal.

Aside from that, I think we would all benefit from refraining from "bikeshedding" the topic of posix_random. If anyone has important input on how the interfaces should behave to be usable or practical to use or to implement (things like reportable failure vs guarantee of non-failure) I don't want to discourage their discussion, but introducing new areas of disagreement like corner-case behavior of uniform random on [0,0) seems to me like it's just derailing focus on the real issues that will affect users and implementors. The whole topic of posix_random is certainly one that lends itself well to bikeshedding already (everyone thinks they know enough to have something to say) and I would hate to see it drag out forever with no conclusion because of this.
(0002971)
Nach (reporter)
2015-12-06 06:58
edited on: 2015-12-06 07:14

Some comments on the proposals:

If we want to help average developers do the right thing as much as possible, the following interface can be improved:
uint32_t posix_random_uniform(uint32_t upper_bound);

To:
uint32_t posix_random_uniform(uint32_t lower_bound, uint32_t upper_bound);

Where the number returned is between [lower_bound, upper_bound] inclusive. Further, the parameters should be allowed to be passed in either order (meaning if lower_bound > upper_bound, reverse them internally). This will give developers an interface to get any range they want, and cannot ever fail due to the parameters passed as every passed combination is valid and has an intuitive meaning (caller wants range between these two numbers).

I'd even suggest offering:
uint64_t posix_random_uniform64(uint64_t lower_bound, uint64_t upper_bound);

And possibly several other related functions (especially signed integers).




The functions being discussed till now are good for some low level situations and for typical cryptographic needs (keys, salts, nonces), however, they don't really cover higher level needs average developers have. For example, I often see custom initial password generation systems which provide users a temporary password or second factor password and similar. Generating such information needs to solely consist of printable ASCII characters or an even more limited character set. The average developer will not be able to figure out how to make use of the currently proposed set of functions to generate what they need without introducing biases. Therefore, they need the following:

int posix_random_whitelist(void *buffer, size_t buffer_length, const uint8_t *whitelist, size_t whitelist_length);
int posix_random_text(char *buffer, size_t buffer_length, const char *whitelist);

posix_random_whitelist() would fill buffer with buffer_length bytes from those specified in the whitelist in a uniform manner. (If the whitelist happens to have repeated bytes, then repeated values would be weighted)
posix_random_text() would fill buffer with buffer_length bytes like above function, and then NULL terminate it (ensure before calling that buffer has at least buffer_length+1 chars free).

These functions should return -1 with errno set to EINVAL if the whitelist is empty.

I'm not sure both of these proposed functions are needed, but at least one of them should be offered.



Some systems not only require a whitelist for passwords or similar things, but they even need a minimum amount of particular ranges present in the generated password. Without getting into whether such system should exist or not, if we want to allow users to generate secure passwords for such systems, an interface along the lines of the following is required:

struct pr_tr
{
  const char *whitelist;
  size_t required;
};

int posix_random_text_require(char *buffer, size_t buffer_length, struct pr_tr *ranges, size_t ranges_amount);

This interface is like the above function, but allows for multiple whitelist ranges, each with a specified amount of minimum characters. This should return -1 and errno set to EINVAL if ranges_amount is 0, all supplied ranges have an empty whitelist, there's an empty white list with a corresponding required greater than 0, or the sum of all required exceeds buffer_length.




Sometimes developers need a way to shuffle the contents of an array, or shuffle a series of objects in a uniform manner, the following function would fit the bill:

int posix_random_shuffle(void *data, size_t unit_size, size_t unit_amount);

Return -1 with errno set to EINVAL if unit_size is 0.

This function is simple to create as a wrapper around something carrying the posix_random_uniform() interface iff the integer size posix_random_uniform() works with is >= sizeof(size_t). As above, I once again stress the need for a 64-bit uniform function so such things can be written correctly. If the idea to supply a 64-bit version of posix_random_uniform() is rejected, then developers have no simple way to create posix_random_shuffle() without biases, and the library must offer it if it wants to support common usage scenarios in higher level applications.





It should be added to the proposal that all the functions in this proposal should ensure that process children (or anything children like) created via fork() or a similar system specific interface (such as clone() on Linux) ensure that the random state is either:

A) Not shared between parent and child or among multiple children. Or
B) Shared between parent and all children in such a way that a request for random data in one process advances the random "stream" shared by all these related processes so that random values are not repeated between the different processes.





Above, steffen mentioned adding a posix_random_addrandom() function. I want to highlight the importance of this. There is a real need today for some way to mix user supplied data into the C(P)RNG in order to influence the output of future random values. Unlike when OpenBSD first created its arc4random interface, virtual machine use has exploded, and the interfaces as exist today in OpenBSD and up for proposal may be obsolete.

Most hosted servers today are typically running on some kind of VM. VMs have the ability to roll back the state of the machine to a previously defined state, and some setups will automatically do so in response to various conditions that arise. The only way to defend against repeated values during VM state rollback is to ensure that every new series of random requests begins with a call to posix_random_addrandom() and is therefore specific to the random request in question.

For example, a server implementing TLS or HTTP Digest Authentication or a similar technology needs to send along server side random values for various algorithms. Anything that uses client side values as part of its generation of random values will be safe. But for any random need that is solely server side, if the server sent Harry a random value, then for whatever reason was rolled back (perhaps due to the influence of Harry), and then Bob connects to the server, the server will send Bob the same random value that Harry is aware of.

To ensure that Harry cannot gain access to random data for Bob's session, the RNG needs a way to be made aware of when it is serving Harry and when it is serving Bob and differentiate between them. If posix_random_addrandom() were to be called by the server passing it the IP and Mac addresses, and the user name of the logged in user, or anything of that nature, it will be able to ensure that no two different connections or users share the same "stream" of random values across VM state rollback.

However, while standardizing posix_random_addrandom() and giving developers such access would be one way to allow developers to ensure they "hedge" their random stream against VM related issues, it doesn't "force" developers to do the right thing. Perhaps every random function should have void *clientdata, size_t clientdata_length added to it. Although doing so without offering counterpart functions without these parameters would probably annoy developers where VM attacks are not relevant to them. Further, if developers fail to understand why mixing this data in is important, and what should be mixed in, they may just end up using the same constants everywhere which defeats the purpose of forcing users to make security-related decisions.

All in all, without allowing developers in userspace some way to differentiate the random "stream" between multiple users in their application or multiple external connections, ensures that the random interface proposed here is useless for properly secured software meant to be run on VMs. Unless of course developers build a whole new set of interfaces on top of it for their application, and never use this interface directly. In which case anything more than a function to provide a random buffer of a given size is overkill, and this entire proposal should be shortened to just posix_random_buffer().

(0002972)
steffen (reporter)
2015-12-07 12:10

|Above, steffen mentioned adding a posix_random_addrandom() function. I want

No.

  1. I just put the original interface up for discussion, the one
     that is in use and that i also mirrored when i implemented my
     own (C++) interface..

  2. ..which however encapsulated the interface into
     a self-contained (lockless C++ class) object.
     Though: there is no "however", since in fact the original
     implementation is designed like that, but on the inside.

The root of the problem being that whatever is not possible on the
lowest level can only be derivated with enormous effort and costs
on higher ones. In the worst case you have to go and reinvent the
complete wheel because of that.

E.g., you have STDIO and if you transform it into the
multithreaded world as-is then you have to ensure each and every
operation is locked, even if i personally only ever use
a manager/worker thread model and thus know for sure that each
STDIO object will never be used concurrently in multiple threads.

There are more examples from other areas like that, take format
encoders which have to produce a result in a single run and thus
have to be hooked on the back via callouts instead of being simple
contexts that can be iterated over via restart on the front
interface. Devilish.
So here there is a random generator, the original design of which
being from David Mazieres in 1996, 20 years, more ideas from Ilya
Mironov (for the original algorithm) in about 2004 (12 years).
The first version already introduced an object based and thus
scalable internal implementation (struct arc4_stream). And now
that parallelism is so much more common than twenty years ago it
is time to offer this option to end-user programmers.

That was my suggestion to this issue, that is my intent in
general: instead of offering some corked version of an interface,
one that may not be sufficient for some particular uses cases --
even though the internal interface is very well able to satisfy!
--, providing access to the full functionality at the same cost.
And the interface is still easy to use.

The reason for including _stir() and _addrandom() in this respect
is just natural, since that is what is used internally: _stir() is
implemented by means of calling _addrandom() with data that is
fetched from some undefined implementation-dependend entropy
source.
I.e., many operating systems use very fancy and costly ways to
generate random entropy _and_ to protect their pool with expensive
message digests etc. This is a science and i'd assume that only
a few dozen people on the world can dare to comment on that for
real, and even some of those are sometimes proven wrong, at times.

Reseeding user-level pseudo-random generators which we talk about
in this topic with such entropy at times via _stir() may be
useful to e.g. furtherly improve the outcome of the next random
generated, or simply to add "more random" into the pool of the
object's entropy.
The end-user programmer should have the option to define points in
time where such an update may be applied, e.g., because blocking
is acceptable at the very point in program execution. Maybe there
should even be a guarantee that blocking doesn't occur. Maybe it
should even be configurable at object generation time wether
auto-reseeding is applied or not. On a per-object level.

If the standard instead adds yet another corked closed-box
function, what sense does that make? How many of those did yet
exist? This interface, on the other hand, has the potential to
persist on the one hand, and to be usable for all thinkable
generic pseudo-random number tasks on the other.

And i simply don't accept consciously brought in irresponsible
conscious misunderstandings like the one of yours. Just like the
one that has been shamelessly published in RFC 6532 and reads "[.]
encoding schemes [.] introduce [.] significant opportunities for
processing errors"; twenty years after the referenced standard
that is used a billion times a day.
Thank you.
(0003294)
tedu (reporter)
2016-07-14 14:29

I'd like to formally withdraw this proposal. Introducing a new name for existing functions causes unnecessary confusion and complication. Please close.
(0003295)
dalias (reporter)
2016-07-14 15:12

Is withdrawing the proposal really necessary? Renaming the proposed interface back to arc4random or whatnot seems like an option if people prefer that, but I would just prefer for this whole bikeshed to end with us adopting _something_, whatever the name, that meets the requirements of having a library-safe (including never-fails property) way to obtain random numbers for cryptographic use.

- Issue History
Date Modified Username Field Change
2014-07-18 13:59 tedu New Issue
2014-07-18 13:59 tedu Name => Ted Unangst
2014-07-18 13:59 tedu Organization => OpenBSD
2014-07-18 13:59 tedu Section => posix_random
2014-07-18 13:59 tedu Page Number => 0
2014-07-18 13:59 tedu Line Number => 0
2014-07-18 14:04 tedu Note Added: 0002314
2014-07-18 17:35 mdempsky Note Added: 0002317
2014-07-18 17:51 mdempsky Note Added: 0002318
2014-07-18 18:22 tedu Note Added: 0002319
2014-07-21 02:59 dalias Note Added: 0002320
2014-07-21 04:07 mdempsky Note Added: 0002321
2014-07-21 04:41 dalias Note Added: 0002322
2014-08-21 15:51 eblake Note Added: 0002352
2014-08-21 15:52 eblake Note Edited: 0002352
2014-09-12 18:37 eblake Note Added: 0002385
2014-09-12 18:43 eblake Note Added: 0002386
2014-09-12 18:57 dalias Note Added: 0002387
2014-11-21 09:29 ajosey Note Added: 0002438
2014-11-22 01:28 emaste Issue Monitored: emaste
2014-12-01 20:36 tedu Note Added: 0002457
2014-12-01 20:41 dalias Note Added: 0002458
2014-12-01 20:47 tedu Note Added: 0002459
2014-12-02 00:13 eblake Note Added: 0002460
2014-12-02 00:22 eblake Note Added: 0002461
2014-12-02 00:27 eblake Relationship added related to 0000743
2014-12-02 00:35 mdempsky Note Added: 0002462
2014-12-02 02:28 tedu Note Added: 0002463
2014-12-02 03:04 dalias Note Added: 0002464
2014-12-02 03:22 tedu Note Added: 0002465
2014-12-02 13:04 nsz Note Added: 0002467
2014-12-04 00:10 tedu Note Added: 0002473
2014-12-10 08:45 nmav Note Added: 0002488
2014-12-10 15:28 eblake Note Added: 0002489
2014-12-10 18:23 safinaskar Note Added: 0002490
2014-12-10 19:28 nmav Note Added: 0002491
2014-12-10 20:16 tedu Note Added: 0002492
2014-12-10 22:33 safinaskar Note Added: 0002493
2014-12-10 22:58 philip-guenther Note Added: 0002494
2014-12-12 08:53 nmav Note Added: 0002497
2014-12-16 13:20 steffen Note Added: 0002499
2014-12-18 04:03 philip-guenther Issue Monitored: philip-guenther
2015-03-13 14:59 steffen Note Added: 0002581
2015-03-13 15:00 steffen Note Added: 0002582
2015-03-13 15:01 steffen Note Added: 0002583
2015-03-13 15:02 steffen Note Edited: 0002582
2015-03-13 15:03 steffen Note Added: 0002584
2015-03-14 01:00 tedu Note Added: 0002585
2015-03-14 10:59 nmav Note Added: 0002586
2015-03-14 11:03 nmav Note Added: 0002587
2015-03-14 12:18 steffen Note Added: 0002588
2015-05-05 08:42 nmav Note Added: 0002647
2015-05-05 11:22 steffen Note Added: 0002651
2015-05-27 22:25 mirabilos Note Added: 0002683
2015-05-28 07:09 nmav Note Added: 0002684
2015-05-29 22:18 mirabilos Note Added: 0002686
2015-05-30 09:45 safinaskar Issue Monitored: safinaskar
2015-05-30 09:45 safinaskar Issue End Monitor: safinaskar
2015-05-30 12:34 steffen Note Added: 0002687
2015-06-19 08:16 nmav Note Added: 0002725
2015-08-14 08:26 nmav Note Added: 0002789
2015-09-17 15:05 EdSchouten Note Added: 0002830
2015-09-17 15:36 nmav Note Added: 0002831
2015-09-17 21:42 EdSchouten Note Added: 0002832
2015-09-17 22:06 eblake Note Added: 0002833
2015-09-18 06:54 EdSchouten Note Added: 0002834
2015-09-18 10:14 steffen Note Added: 0002836
2015-09-18 11:30 safinaskar Issue Monitored: safinaskar
2015-09-18 11:31 safinaskar Issue End Monitor: safinaskar
2015-09-18 15:16 tedu Note Added: 0002837
2015-09-18 16:10 EdSchouten Note Added: 0002838
2015-09-21 17:23 shware_systems Note Added: 0002840
2015-09-21 17:33 dalias Note Added: 0002841
2015-12-06 06:58 Nach Note Added: 0002971
2015-12-06 07:06 Nach Note Edited: 0002971
2015-12-06 07:14 Nach Note Edited: 0002971
2015-12-06 07:16 Nach Issue Monitored: Nach
2015-12-07 12:10 steffen Note Added: 0002972
2016-07-14 14:29 tedu Note Added: 0003294
2016-07-14 15:12 dalias Note Added: 0003295
2016-07-14 15:13 Don Cragun Interp Status => ---
2016-07-14 15:13 Don Cragun Status New => Closed
2016-07-14 15:13 Don Cragun Resolution Open => Withdrawn
2017-03-31 07:17 Don Cragun Relationship added related to 0001134


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker