lind.design.randomization package

Submodules

lind.design.randomization.md5 module

md5: Hashing can be used to generate reproducible pseudo-randomization. This can be useful in contexts where the user does not want to store a fixed seed to ensure replicability of test randomization.

Examples

>>> # numpy randomization with fixed seed
>>> random_state = np.random.RandomState(42)
>>> random_state.choice(["Lauren", "Sam", "Ben"], size=1)
>>> # hd5 random sample (no seed required)
>>> md5shuffle(["Lauren", "Sam", "Ben"])[0]
lind.design.randomization.md5.md5shuffle(arr: numpy.ndarray, salt: str = None) → numpy.ndarray

Will shuffle the input array pseudo-randomly in a deterministic manner using MD5 hashing.

Parameters
  • arr (list, ndarray) – The array of values that you want shuffled

  • salt (str) – A sting to append to sample ids to avoid collisions across experiments testing on the same population. If None, then no salt is applied.

Returns

the input array in a shuffled order

Return type

ndarray

Examples

>>> md5shuffle(
>>>     arr=[i for i in range(1000)],
>>>     salt="whale"
>>> )
lind.design.randomization.md5.draw_percentile(arr: Union[List, numpy.ndarray], lb: float = 0.25, ub: float = 0.75, salt: str = None) → numpy.ndarray

Draw array values that fall within a certain percentile of the hash space.

Parameters
  • arr (list, ndarray) – An array of objects that you want to sample from

  • lb (float, optional) – The lower bound of the percentile; must be between 0 and 1

  • ub (float, optional) – The upper bound of the percentile; must be between 0 and 1; must be greater than lb

  • salt (str) – A sting to append to sample ids to avoid collisions across experiments testing on the same population. If None, then no salt is applied.

Returns

an array of values from arr that fall within the specified percentile of the hash space

Return type

ndarray

Examples

>>> draw_percentile([i for i in range(1000)], lb=0.25, ub=0.75) # sample 50% of inputs

Module contents

randomization / pseudo-randomization utilities for treatment assignment and segmentation