From: Andrei Gaponenko <andr@triumf.ca>
Date: Tue, 23 Oct 2001 21:20:48 -0700 (PDT)
To: <E614software@relay.phys.ualberta.ca>
Subject: random number generators



Below is some information about random number generators and
a suggestion how to set initial seeds for parallel MC production.


****************************************************************

The random number generator used by GEANT 3.21 is a copy of CERNLIB's
RANECU, http://wwwinfo.cern.ch/asdoc/shortwrupsdir/v114/top.html

Some facts about the generator:

     1) NOT ALL initial seed pair define a good random number
	sequence.

     2) There are 215 known seed pairs which define "good" sequences
	10^9 numbers apart from each other. E.g. a sequence started
	with one of the seed pairs after 10^9 numbers WILL overlap
	with another sequence.  So in practice we are limited by 10^9
	numbers per job despite of the fact that the actual period of
	the generator is about 10^18,

     3) Quality of the random sequences: "probably corresponds to a
	luxury level between 1 and 2, [of the RANLUX generator] but
	this is based only on empirical testing and true quality may
	be lower."
	(http://wwwinfo.cern.ch/asdoc/shortwrupsdir/v115/top.html)
	Note that the "level 1" fails spectral test, and "level 2" is 
	theoretically still defective. 


*** SEEDS FOR NOW ***

Because of 1) we shouldn't use time or any other "random" numbers as
seeds to start a job. I think the best and the simplest solution at
the moment is to use the pre-defined RANECU sequences. To do it one
has to set the RNDM card in e614.ffcards as 

    RNDM  sequence_number 0

where the sequence_number is between 1 and 215 inclusive. This doesn't
require any modifications to GEANT, but each batch jobs should have
it's own ffcards file. The files can be easily generated from a
"master" file with a simple script. 


*** CURRENT LIMIT ON A NUMBER OF EVENTS PER JOB ***

1.4*10^4 to about 5*10^4 random numbers per event are used by TWIST GEANT
depending on ffcards settings. This is the range I've seen in a few
runs with different settings; actual range may be wider. 

These numbers are not due to the initialization phase. In fact, the
number of GRNDM calls grows even faster than linearly with the number of
events:

	events	 GRNDM calls
	
	10	 121840
	100	 1393553
	1000	 14408401

We see that a RANECU sequence may be exhausted after 10^9/5*10^4 =
2*10^4 to 6*10^4 events. So, there is no sense to generate more than
60,000 events per job if the next RANECU sequence is being used by
another job. All 200 pre-defined sequences give us only 4 to 12
millions of events. This may be enough for some, but definitely not
for all studies.

I don't think an MC result obtained with a "re-cycled" random number
generator should be believed, despite of the argument that the same
numbers may be used by the program "in other places" after the
sequence is exhausted. Once I accidentally used two random number
generators giving the same sequence in two different places of my
program, and the results were really weird. It seemed this couldn't be
a reason, but removing the second generator fixed the problem in the
case.


*** BETTER GENERATORS FOR THE FUTURE  ***

In the future we have to replace RANECU generator with a better
one. This can be done in the GEANT3 framework by providing GRNDM and
GRNDMQ in user code. Or, GEANT4 has more powerful generators readily
available. 

The user-defined GRNDM can call, for example, RANLUX to do the job. 

http://wwwinfo.cern.ch/asdoc/shortwrupsdir/v115/top.html

RANLUX has a period of 10^185 and "every possible value of INT gives
rise to a valid, independent sequence which will not overlap any
sequence initialized with any other value of INT."  This makes it a
very convenient choice for parallel production MC.  But it's 4 to more
than 6 times slower than RANECU depending on the "luxury level", see
below. 

Another good candidate is the "Mersenne Twister" generator

http://www.math.keio.ac.jp/~matumoto/emt.html

It has the incredible 2^19937-1 period; it's very fast.  Besides these
and other merits it has a suitable name :-) But I haven't found any
information about getting independent sequences from it. I've sent an
e-mail to one of the authors, but still have no reply. Obviously we
can provide "independent" sequences by ourselves in the same manner as
they were made for RANECU: by generating a single long sequence and
splitting it into "independent sub-sequences" of a length suitable to
us. But it seems to require a considerable effort.

May be we can put in RANLUX as a quick fix, and go to the Mersenne
Twister (because of it's speed) some time later.


*** TIMING CONSIDERATIONS ***

(Test at rach: Pentium III, 800MHz)

GRNDM/RANECU: 131 seconds / 10^9 numbers

RANLUX at 3:  520 seconds / 10^9 numbers

RANLUX at 4:  831 seconds / 10^9 numbers

MT:	      103 seconds / 10^9 numbers


****************************************************************


random number generators / Andrei Gaponenko

Created for the The Center for Subatomic Research E614 Project Projects Page.
Created by The CoCoBoard.