Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
1"""
2factorial: This module contains tools for designing factorial experiments. Full factorial
3experiments (where every combination of treatments is explored) and partial factorial (where only a
4fraction of combinations are explored). Partial factorial experiments are sometimes referred to as
5fractional factorial experiments.
7The factorial designs here are meant to yield balanced and orthogonal designs. An experimental
8design is orthogonal if the effects of any factor (i.e. factor A) balance out (sum to zero) across
9the effects of the other factors (i.e. factors B and C). In other words, if A is orthogonal to B
10and C, then the measurement of factors B and C will not be biased by the effect size fo A. A
11balanced design assumes equal sample sizes across att cohorts / test cells.
13One quick check of orthogonality for a 2 level design is to take the sum of the columns of the
14design. They should all sum to 0. See below:
15>>> design_partial_factorial(factors=6, res=4).sum(axis=0)
17If possible, all combinations (rows) in these designs should be run in a random order, or in
18parallel using proper randomization of cohort assignment.
20Recommended import style:
21>>> from lind.design import factorial
23"""
25import logging
26from typing import Union, List, Optional
28from itertools import product, combinations
29from fractions import Fraction
31from numpy import full, arange, vectorize, ndarray, array_str, asarray
32from scipy.special import binom
34from pandas import DataFrame, read_csv
35from patsy import dmatrix # pylint: disable=no-name-in-module
37from lind._utilities import _check_int_input
38from lind import _sfap
40# set logging
41logging.basicConfig(level=logging.INFO)
42logger = logging.getLogger(__name__)
44# define public functions (ignored by jupyter notebooks)
45__all__ = [
46 'design_full_factorial',
47 'design_partial_factorial',
48 'fetch_partial_factorial_design'
49]
51####################################################################################################
54def _array_to_string(arr_like: Union[List, ndarray]) -> ndarray:
55 """Utility for converting experiment design string into an array of factors"""
56 return array_str(asarray(arr_like)).replace("[", "").replace("]", "")
59def _k_combo(k: int, res: int) -> int:
60 """The number of combinations of k factors given a specific resolution"""
61 return binom(
62 full(k - res + 1, k),
63 arange(res - 1, k, 1)
64 ).sum() + k
67_k_combo_vec = vectorize(_k_combo, excluded=['res'],
68 doc="The number of combinations of k factors given a specific resolution")
71####################################################################################################
74def design_full_factorial(factors: List[List],
75 factor_names: Optional[List[str]] = None) -> DataFrame:
76 """
77 design_full_factorial
79 This function helps create a full factorial experiment design. Given how easy it is to design a
80 full factorial experiment once the factors and levels have been specified, this is more of a
81 convenience function.
83 Parameters
84 ----------
85 factors : List[List]
86 a list of lists representing factors and levels
87 factor_names : List[str], optional
88 a list of names for the factors in the first argument. Must share the order of the first
89 argument.
91 Returns
92 -------
93 pd.DataFrame
95 Examples
96 --------
97 >>> # create full factorial design for a 2 level 3 factor experiment
98 >>> design_df = design_full_factorial(factors=[[-1, 1], [-1,1], [-1, 1]],
99 >>> factor_names=["factor_one", "factor_two", "factor_three"])
101 """
103 assert factor_names is None or len(factor_names) == len(factors), \
104 "The length of factor_names must match the length of factors."
105 factor_names = factor_names if factor_names is not None else \
106 ["x{}".format(i) for i in range(len(factors))]
107 return DataFrame(data=list(product(*factors)), columns=factor_names)
110def design_partial_factorial(k: int, res: int) -> DataFrame:
111 """
112 design_partial_factorial
114 This function helps design 2 level partial factorial experiments. These experiments are often
115 described using the syntax l**(k-p) where l represents the level of each factor, k represents
116 the total number of factors considered, and p represents a scaling factor relative to the full
117 factorial design.
119 This function assumes that l=2. Users are not asked to set p, instead the user sets a minimum
120 desired resolution for their experiment. Resolution describes the kind of aliasing incurred by
121 scaling down from a full to a partial factorial design. Higher resolutions have less potential
122 aliasing (confounding).
124 Resolution number is determined through the defining relation of the partial factorial design.
125 For the 6 factor design 2**(6-p) with factors ABCDEF, example defining relations (I) are shown
126 below. The resolution cannot exceed the number of factors in the experiment. So a 6 factor
127 experiment can be at most a resolution 6 (otherwise it would be a full factorial experiment).
129 * Res I: I = A
130 * Res II: I = AB
131 * Res III: I = ABC
132 * Res IV: I = ABCD
133 * Res V: I = ABCDE
134 * Res VI: I = ABCDEF
136 Practically we tend to use resolution III-, IV- and V-designs.
138 * Res I: Cannot distinguish between levels within main effects (not useful).
139 * Res II: Main effects may be aliased with other main effects (not useful).
140 * Res III: Main effects may be aliased with two-way interactions.
141 * Res IV: Two-way interactions may be aliased with each other.
142 * Res V: Two-way interactions may be aliased with three-way interactions.
143 * Res VI: Three-way interactions may be aliased with each other.
145 Parameters
146 ----------
147 k : int
148 the total number of factors considered in the experiment
149 res : int
150 the desired minimum resolution of the experiment
152 Returns
153 -------
154 pd.DataFrame
155 A dataframe with the partial factorial design
157 Examples
158 --------
159 >>> # create partial factorial design for a 2 level 4 factor resolution III experiment
160 >>> design_df = design_partial_factorial(k=4, res=3)
162 """
164 _check_int_input(k, "k")
165 _check_int_input(res, "res")
166 assert res <= k, "Resolution must be smaller than or equal to the number of factors."
168 # Assume l=2 and use k specified by user to solve for p in design
169 n = arange(res - 1, k, 1)
170 k_minus_p = k - 1 if res == k else n[~(_k_combo_vec(n, res) < k)][0]
172 logging.info("Partial Factorial Design: l=2, k={}, p={}".format(k, k - k_minus_p))
173 logging.info("Ratio to Full Factorial Design: {}".format(Fraction(2**k_minus_p / 2**k)))
175 # identify the main effects and interactions for the design
177 main_factors = arange(k_minus_p)
178 clean = lambda x: x.replace(" ", " ").strip(" ").replace(" ", ":")
179 interactions = [clean(_array_to_string(main_factors))] if res == k else \
180 [
181 clean(_array_to_string(c))
182 for r in range(res - 1, k_minus_p)
183 for c in combinations(main_factors, r)
184 ][:k - k_minus_p]
186 # combine main effects and interactions into a single design string (format inspired by patsy)
187 factors = " ".join([_array_to_string(main_factors)] + interactions)
188 logging.info("Design string: {}".format(factors))
190 main_factors = [i for i in factors.split(" ") if i and ":" not in i]
191 two_level_full_factorial = [[-1, 1] for _ in main_factors]
192 full_factorial_design = design_full_factorial(two_level_full_factorial)
194 interactions = [
195 ["x" + i for i in j.split(":")]
196 for j in [i for i in factors.split(" ") if i and ":" in i]
197 ]
199 design = "+".join(full_factorial_design.columns.tolist() + [":".join(i) for i in interactions])
200 partial_factorial_design = dmatrix(design, full_factorial_design, return_type='dataframe').drop(
201 columns=["Intercept"], axis=1)
203 partial_factorial_design.columns = \
204 ["x{}".format(i) for i in range(partial_factorial_design.shape[1])]
206 return partial_factorial_design
209####################################################################################################
212def fetch_partial_factorial_design(design_name: str = "toc") -> DataFrame:
213 """
214 fetch_partial_factorial_design
216 The function design_partial_factorial auto generates partial factorial designs using an
217 algorithm. We validate that algorithm in our unit tests by comparing against known designs
218 from popular experimental design textbooks. For those that want to use the designs from
219 these books rather than the auto-generated designs, please use thos function.
221 There are multiple ways to generate certain designs given a fixed k and p
222 (using formula l**k-p). Both fetch_partial_factorial_design and design_partial_factorial
223 deterministically return designs, but there are typically other ways to formulate these designs
224 if the user would like to work it out on their own.
226 Parameters
227 ----------
228 design_name : str
229 the name of the design to fetch; to see available designs input `toc`
231 Returns
232 -------
233 pd.DataFrame
234 experiment design or toc of available designs
236 Examples
237 --------
238 >>> table_of_contents_of_designs = fetch_partial_factorial_design("toc")
239 >>> design = fetch_partial_factorial_design("2**3-1")
241 References
242 ----------
243 NIST
244 * Section 5.3.3.4.7 of the Engineering Statistics Handbook
245 Box, Hunter, & Hunter
246 * Statistics For Experimentors
247 Taguchi
248 * Systems Of Experimental Design, VOL. 2
250 Notes
251 -----
252 * 2**3-1 is equivalent to a Taguchi L4 design
253 * 2**15-11 is equivalent to a Taguchi L16 design
254 * 2**31-26 is equivalent to a Taguchi L32 design
256 """
258 assert isinstance(design_name, str), "Input design_name must be a string."
259 design_name = design_name.lower().strip() + ".csv"
260 if _sfap is None:
261 raise Exception("Missing dependency lind-static-resources")
262 try:
263 return read_csv(_sfap+"/factorial/"+design_name, index_col=0)
264 except FileNotFoundError as exception:
265 logging.error(exception)
266 raise ValueError("Please input a valid design. `{}` not found. "
267 "See docstring for help.".format(design_name[:-4]))