Hide keyboard shortcuts

Hot-keys on this page

r m x p   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

1""" 

2factorial: This module contains tools for designing factorial experiments. Full factorial 

3experiments (where every combination of treatments is explored) and partial factorial (where only a 

4fraction of combinations are explored). Partial factorial experiments are sometimes referred to as 

5fractional factorial experiments. 

6 

7The factorial designs here are meant to yield balanced and orthogonal designs. An experimental 

8design is orthogonal if the effects of any factor (i.e. factor A) balance out (sum to zero) across 

9the effects of the other factors (i.e. factors B and C). In other words, if A is orthogonal to B 

10and C, then the measurement of factors B and C will not be biased by the effect size fo A. A 

11balanced design assumes equal sample sizes across att cohorts / test cells. 

12 

13One quick check of orthogonality for a 2 level design is to take the sum of the columns of the 

14design. They should all sum to 0. See below: 

15>>> design_partial_factorial(factors=6, res=4).sum(axis=0) 

16 

17If possible, all combinations (rows) in these designs should be run in a random order, or in 

18parallel using proper randomization of cohort assignment. 

19 

20Recommended import style: 

21>>> from lind.design import factorial 

22 

23""" 

24 

25import logging 

26from typing import Union, List, Optional 

27 

28from itertools import product, combinations 

29from fractions import Fraction 

30 

31from numpy import full, arange, vectorize, ndarray, array_str, asarray 

32from scipy.special import binom 

33 

34from pandas import DataFrame, read_csv 

35from patsy import dmatrix # pylint: disable=no-name-in-module 

36 

37from lind._utilities import _check_int_input 

38from lind import _sfap 

39 

40# set logging 

41logging.basicConfig(level=logging.INFO) 

42logger = logging.getLogger(__name__) 

43 

44# define public functions (ignored by jupyter notebooks) 

45__all__ = [ 

46 'design_full_factorial', 

47 'design_partial_factorial', 

48 'fetch_partial_factorial_design' 

49] 

50 

51#################################################################################################### 

52 

53 

54def _array_to_string(arr_like: Union[List, ndarray]) -> ndarray: 

55 """Utility for converting experiment design string into an array of factors""" 

56 return array_str(asarray(arr_like)).replace("[", "").replace("]", "") 

57 

58 

59def _k_combo(k: int, res: int) -> int: 

60 """The number of combinations of k factors given a specific resolution""" 

61 return binom( 

62 full(k - res + 1, k), 

63 arange(res - 1, k, 1) 

64 ).sum() + k 

65 

66 

67_k_combo_vec = vectorize(_k_combo, excluded=['res'], 

68 doc="The number of combinations of k factors given a specific resolution") 

69 

70 

71#################################################################################################### 

72 

73 

74def design_full_factorial(factors: List[List], 

75 factor_names: Optional[List[str]] = None) -> DataFrame: 

76 """ 

77 design_full_factorial 

78 

79 This function helps create a full factorial experiment design. Given how easy it is to design a 

80 full factorial experiment once the factors and levels have been specified, this is more of a 

81 convenience function. 

82 

83 Parameters 

84 ---------- 

85 factors : List[List] 

86 a list of lists representing factors and levels 

87 factor_names : List[str], optional 

88 a list of names for the factors in the first argument. Must share the order of the first 

89 argument. 

90 

91 Returns 

92 ------- 

93 pd.DataFrame 

94 

95 Examples 

96 -------- 

97 >>> # create full factorial design for a 2 level 3 factor experiment 

98 >>> design_df = design_full_factorial(factors=[[-1, 1], [-1,1], [-1, 1]], 

99 >>> factor_names=["factor_one", "factor_two", "factor_three"]) 

100 

101 """ 

102 

103 assert factor_names is None or len(factor_names) == len(factors), \ 

104 "The length of factor_names must match the length of factors." 

105 factor_names = factor_names if factor_names is not None else \ 

106 ["x{}".format(i) for i in range(len(factors))] 

107 return DataFrame(data=list(product(*factors)), columns=factor_names) 

108 

109 

110def design_partial_factorial(k: int, res: int) -> DataFrame: 

111 """ 

112 design_partial_factorial 

113 

114 This function helps design 2 level partial factorial experiments. These experiments are often 

115 described using the syntax l**(k-p) where l represents the level of each factor, k represents 

116 the total number of factors considered, and p represents a scaling factor relative to the full 

117 factorial design. 

118 

119 This function assumes that l=2. Users are not asked to set p, instead the user sets a minimum 

120 desired resolution for their experiment. Resolution describes the kind of aliasing incurred by 

121 scaling down from a full to a partial factorial design. Higher resolutions have less potential 

122 aliasing (confounding). 

123 

124 Resolution number is determined through the defining relation of the partial factorial design. 

125 For the 6 factor design 2**(6-p) with factors ABCDEF, example defining relations (I) are shown 

126 below. The resolution cannot exceed the number of factors in the experiment. So a 6 factor 

127 experiment can be at most a resolution 6 (otherwise it would be a full factorial experiment). 

128 

129 * Res I: I = A 

130 * Res II: I = AB 

131 * Res III: I = ABC 

132 * Res IV: I = ABCD 

133 * Res V: I = ABCDE 

134 * Res VI: I = ABCDEF 

135 

136 Practically we tend to use resolution III-, IV- and V-designs. 

137 

138 * Res I: Cannot distinguish between levels within main effects (not useful). 

139 * Res II: Main effects may be aliased with other main effects (not useful). 

140 * Res III: Main effects may be aliased with two-way interactions. 

141 * Res IV: Two-way interactions may be aliased with each other. 

142 * Res V: Two-way interactions may be aliased with three-way interactions. 

143 * Res VI: Three-way interactions may be aliased with each other. 

144 

145 Parameters 

146 ---------- 

147 k : int 

148 the total number of factors considered in the experiment 

149 res : int 

150 the desired minimum resolution of the experiment 

151 

152 Returns 

153 ------- 

154 pd.DataFrame 

155 A dataframe with the partial factorial design 

156 

157 Examples 

158 -------- 

159 >>> # create partial factorial design for a 2 level 4 factor resolution III experiment 

160 >>> design_df = design_partial_factorial(k=4, res=3) 

161 

162 """ 

163 

164 _check_int_input(k, "k") 

165 _check_int_input(res, "res") 

166 assert res <= k, "Resolution must be smaller than or equal to the number of factors." 

167 

168 # Assume l=2 and use k specified by user to solve for p in design 

169 n = arange(res - 1, k, 1) 

170 k_minus_p = k - 1 if res == k else n[~(_k_combo_vec(n, res) < k)][0] 

171 

172 logging.info("Partial Factorial Design: l=2, k={}, p={}".format(k, k - k_minus_p)) 

173 logging.info("Ratio to Full Factorial Design: {}".format(Fraction(2**k_minus_p / 2**k))) 

174 

175 # identify the main effects and interactions for the design 

176 

177 main_factors = arange(k_minus_p) 

178 clean = lambda x: x.replace(" ", " ").strip(" ").replace(" ", ":") 

179 interactions = [clean(_array_to_string(main_factors))] if res == k else \ 

180 [ 

181 clean(_array_to_string(c)) 

182 for r in range(res - 1, k_minus_p) 

183 for c in combinations(main_factors, r) 

184 ][:k - k_minus_p] 

185 

186 # combine main effects and interactions into a single design string (format inspired by patsy) 

187 factors = " ".join([_array_to_string(main_factors)] + interactions) 

188 logging.info("Design string: {}".format(factors)) 

189 

190 main_factors = [i for i in factors.split(" ") if i and ":" not in i] 

191 two_level_full_factorial = [[-1, 1] for _ in main_factors] 

192 full_factorial_design = design_full_factorial(two_level_full_factorial) 

193 

194 interactions = [ 

195 ["x" + i for i in j.split(":")] 

196 for j in [i for i in factors.split(" ") if i and ":" in i] 

197 ] 

198 

199 design = "+".join(full_factorial_design.columns.tolist() + [":".join(i) for i in interactions]) 

200 partial_factorial_design = dmatrix(design, full_factorial_design, return_type='dataframe').drop( 

201 columns=["Intercept"], axis=1) 

202 

203 partial_factorial_design.columns = \ 

204 ["x{}".format(i) for i in range(partial_factorial_design.shape[1])] 

205 

206 return partial_factorial_design 

207 

208 

209#################################################################################################### 

210 

211 

212def fetch_partial_factorial_design(design_name: str = "toc") -> DataFrame: 

213 """ 

214 fetch_partial_factorial_design 

215 

216 The function design_partial_factorial auto generates partial factorial designs using an 

217 algorithm. We validate that algorithm in our unit tests by comparing against known designs 

218 from popular experimental design textbooks. For those that want to use the designs from 

219 these books rather than the auto-generated designs, please use thos function. 

220 

221 There are multiple ways to generate certain designs given a fixed k and p 

222 (using formula l**k-p). Both fetch_partial_factorial_design and design_partial_factorial 

223 deterministically return designs, but there are typically other ways to formulate these designs 

224 if the user would like to work it out on their own. 

225 

226 Parameters 

227 ---------- 

228 design_name : str 

229 the name of the design to fetch; to see available designs input `toc` 

230 

231 Returns 

232 ------- 

233 pd.DataFrame 

234 experiment design or toc of available designs 

235 

236 Examples 

237 -------- 

238 >>> table_of_contents_of_designs = fetch_partial_factorial_design("toc") 

239 >>> design = fetch_partial_factorial_design("2**3-1") 

240 

241 References 

242 ---------- 

243 NIST 

244 * Section 5.3.3.4.7 of the Engineering Statistics Handbook 

245 Box, Hunter, & Hunter 

246 * Statistics For Experimentors 

247 Taguchi 

248 * Systems Of Experimental Design, VOL. 2 

249 

250 Notes 

251 ----- 

252 * 2**3-1 is equivalent to a Taguchi L4 design 

253 * 2**15-11 is equivalent to a Taguchi L16 design 

254 * 2**31-26 is equivalent to a Taguchi L32 design 

255 

256 """ 

257 

258 assert isinstance(design_name, str), "Input design_name must be a string." 

259 design_name = design_name.lower().strip() + ".csv" 

260 if _sfap is None: 

261 raise Exception("Missing dependency lind-static-resources") 

262 try: 

263 return read_csv(_sfap+"/factorial/"+design_name, index_col=0) 

264 except FileNotFoundError as exception: 

265 logging.error(exception) 

266 raise ValueError("Please input a valid design. `{}` not found. " 

267 "See docstring for help.".format(design_name[:-4]))