# Introduction

xorshift* generators are very fast, high-quality PRNGs (pseudorandom number generators) obtained by scrambling the output of a Marsaglia xorshift generator with a 64-bit invertible multiplier (as suggested by Marsaglia in his paper). They are an excellent choice for all non-cryptographic applications, as they are incredibly fast, have long periods and their output passes strong statistical test suites.

# ﻿A PRNG Shootout

This page contains a shootout of a few recent PRNGs (pseudorandom number generators) that are quite widely used. The purpose is that of providing a consistent, reproducible assessment of two properties of the generators: speed and quality.

## ﻿Speed

The speed reported in this page is the time required to emit 64 random bits. If a generator is 32-bit in nature, we glue two consecutive outputs. This is, of course, the easiest measure. Note that we do not report results using GPUs or SSE instructions: for that to be meaningful, we should have implementations for all generators (which will happen, ultimately). The tests were performed on an Intel® Core™ i7-4770 CPU @3.40GHz (Haswell).

## ﻿Quality

This is probably the more elusive property of a PRNG. Here quality is measured using the powerful BigCrush suite of tests. BigCrush is part of TestU01, a monumental framework for testing PRNGs developed by Pierre L'Ecuyer and Richard Simard (“TestU01: A C library for empirical testing of random number generators”, ACM Trans. Math. Softw. 33(4), Article 22, 2007).

We run BigCrush starting from 100 equispaced points of the state space of the generator and collect failures—tests in which the p-value statistics is outside the interval [0.001..0.999]. A failure is systematic if it happens at all points.

Note that TestU01 is a 32-bit test suite. Thus, two 32-bit integer values are passed to the test suite for each generated 64-bit value. Floating point numbers are generated instead by dividing the unsigned output of the generator by 264. Since this implies a bias towards the high bits (which is anyway a known characteristic of TestU01), we run the test suite also on the reverse generator and add up the resulting failures.

More detail about the whole process can be found in this paper.

PRNG Period Failures (standard) Failures (reverse) Overall Systematic ns/64 bit
xorshift1024* 21024 − 129 22 511.36
MurmurHash3 ≤264 − 128 37 653.15
xorshift4096* 24096 − 133 34 671.36
xorgens4096 24096 − 142 40 821.68
xorshift64* 264 − 1230 133 363MatrixRank1.60
MT19937-64 219937 − 1 258 258 516LinearComp4.09
WELL1024a 21024 − 1 441 441 882MatrixRank, LinearComp5.18
WELL19937a 219937 − 1 235 233 468LinearComp8.01
xorshift64 (best) 264 − 1762 750 1512BirthdaySpacings, MatrixRank, LinearComp1.64
java.util.Random 248 − 1 4078 9486 13564Almost all tests2.60

I'll be happy to add to the list any PRNG that can improve significantly on the ones already listed. You can also install TestU01, download my code and start from there to do your own tests.

# ﻿Remarks

## A long period does not imply high quality

This is a common misconception. The generator x++ has period $$2^k$$, for any $$k\geq0$$, provided that x is represented using $$k$$ bits: nonetheless, it is a horrible generator.

It is however important that the period is long enough. A first heuristic rule of thumb is that if you need to use $$t$$ values, you need a generator with period at least $$t^2$$. Moreover, if you run $$p$$ independent computations starting at random seeds, the sequences used by each computation should not overlap. We can stay on the safe side and require that the period is long enough so that the probability that $$p$$ sequences of $$t^2$$ elements starting at random positions overlap is very low.

Now, the probability that $$p$$ points in an interval of integers of length $$L$$ are at least at distance $$d$$ is $\left( 1 - \frac{d(p-1)}L\right)^p \approx \left(e^{-d(p-1)}\right)^{p/L} \approx 1 - \frac{dp(p-1)}L,$ assuming that $$L$$ is large and $$d,p\ll L$$. If your generator has period $$2^{512}$$ and you run on $$2^{100}$$ cores (you will never have them) a computation using $$2^{100}$$ pseudorandom numbers (you'll never have the time) the probability of overlap would be less than $$2^{-112}$$.

In other words: any generator with a period beyond, say, $$2^{1024}$$ (just to stay again on the safe side) has a period that is sufficiently long for every imaginable application. Unless there are other motivations (e.g., provably increased quality), a generator with a larger period is only a waste of memory (as it needs a large state space), of cache lines, and of precious high-entropy random bits for seeding (unless you're using small seeds, but then it's not clear why you would want a very long period in the first place—the computation above is valid only if you seed all bits of the state space with independent, uniformly distributed random bits).

## How can a xorshift64* generator be slower than a xorshift1024* generator?

Dependencies. The three xor/shifts of a xorshift64* generator must be executed sequentially, as each one is dependent on the result of the previous one. In a xorshift1024* generator two of the xor/shifts are completely independent and can be parallelized internally by the CPU. I also suspect that the larger state space makes it possible for the CPU to perform more aggressively speculative execution (indeed, a xorshift128* generator is as fast as a xorshift64* generator).

## Why just sizes 64, 1024 and 4096?

64 is the smallest possible size if you use 64-bit shifts, and it makes it easy to have a generator embedded in a Java object without creating an additional object. Moreover, for practical reasons people often uses 64-bit seeds, and a xorshift64* generator can then be used to produce the actual seed of a generator with a larger state space.

4096 is the largest currently possible state space, so it's nice to have it here even if it is not really useful.

1024 seems the most reasonable choice for a general-purpose generator: it is large enough, but not uselessly large. One could try 128 or 256 bits of state space to have a more compact generator without the MatrixRank failure of xorshift64*.

## What about the generator used in C, say by gcc?

Applying the same tests from the C API is impossible: the generator is not standardized, rand() returns a weird number of bits (usually 31) and there's not way to manipulate directly the state of the generator. Traditionally, C standard libraries used a linear congruential generator similar to java.util.Random. Apparently, recent versions of glibc use a trinomial-based linear-feedback shift register or even the SIMD version of MT19937.