Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
1"""
2Standard checks of randomization. These are mainly used in the unit test suite to sanity check
3randomization utilities in this package.
4"""
6import logging
7from typing import Union, List
9from numpy import ndarray, median, sqrt
11# set logging
12logging.basicConfig(level=logging.INFO)
13logger = logging.getLogger(__name__)
15# define public functions (ignored by jupyter notebooks)
16__all__ = [
17 "runs_test"
18]
20####################################################################################################
23def runs_test(arr: Union[ndarray, List]) -> float:
24 """
25 runs_test
27 Run tests are a very simple method of sanity checking a set of random numbers. A run is defined
28 as a series of increasing values or a series of decreasing values. The number of increasing, or
29 decreasing, values is the length of the run.
31 In a random data set, the probability that the (I+1)th value is larger or smaller than the Ith
32 value follows a binomial distribution, which forms the basis of the runs test.
34 Null Hypothesis: The sequence was produced in a random manner.
36 Frequentist test statistics can be viewed as thresholds on signal to noise ratios (see equation
37 below). For this test, the signal is difference in actual number of runs and expected number of
38 runs given sample size.
40 test statistis = Z = signal / noise = (R - R_bar) / sigma_R
42 Parameters
43 ----------
44 arr: ndarray, list
45 A 1d array or list of values to evaluate for "randomness"
47 Returns
48 -------
50 Examples
51 --------
52 >>> random_arr = np.radnom.normal(0, 10, 1000)
53 >>> z_statistic = runs_test(random_arr)
55 References
56 ----------
57 Bradley
58 * Distribution-Free Statistical Tests (1968), Chapter 12
59 NIST
60 * Engineering Statistics Handbook 1.3.5.13
62 """
63 runs, n1, n2 = 0, 0, 0
64 arr_median = median(arr)
66 # Checking for start of new run
67 for i in range(len(arr)):
68 # no. of runs
69 if (arr[i] >= arr_median > arr[i - 1]) or (arr[i] < arr_median <= arr[i - 1]):
70 runs += 1
71 # no. of positive values
72 if arr[i] >= arr_median:
73 n1 += 1
74 # no. of negative values
75 else:
76 n2 += 1
78 runs_exp = ((2 * n1 * n2) / (n1 + n2)) + 1
79 stan_dev = sqrt((2 * n1 * n2 * (2 * n1 * n2 - n1 - n2)) / (((n1 + n2) ** 2) * (n1 + n2 - 1)))
80 return (runs - runs_exp) / stan_dev