Statistical Design of Experiments: An introductory case study for polymer composites manufacturing applications

Statistical design of experiments (DoE) aims to develop a near efficient design while minimising the number of experiments required. This is an optimal approach especially when there is a need to investigate multiple variables. DoE is a powerful methodology for a wide range of applications, from the efficient design of manufacturing processes to the accurate evaluation of global optima in numerical studies. The contribution of this paper is to provide a general introduction to statistical design of experiments for a non-expert audience, with the aim of broadening exposure in the applied mechanics community. We focus on response surface methodology (RSM) designs — Taguchi Design, Central Composite Design, Box-Behnken Design and D-optimal Design. These different RSM designs are compared in the context of a case study from the field of polymer composites. The results demonstrate that an exact D-optimal design is generally considered to be a good design when compared to the global D-optimal design. That is, it requires fewer experiments while retaining acceptable efficiency measures for all three response surface models considered. This paper illustrates the benefits of DoE, demonstrates the importance of evaluating different designs, and provides an approach to choose the design best suited for the problem of interest.


Introduction
Experimental and numerical work are key parts of any research or industrial project. The engineers and scientists conducting these investigations frequently use some statistical analysis to analyse and report their results. However, not all these investigators have been trained in statistical methods, resulting in a lack of knowledge of how to properly design an experiment or apply the correct statistical tools when there are multiple interacting variables [1]. As a result, investigators often use the one-factor-at-a-time approach which is commonly taught during their training or practised in industry [1].
In an experimental design we need to identify the factors, levels and response variables. A factor is the input variable of interest. If we consider the example of melt compounding of polymer composites, as in the study by Ujianto et al. [2], the factors might include temperature and screw speed. The levels are the number of different variations we consider for each factor, for example, with three levels we would consider a low, medium and high temperature. A response variable is the output variable of interest, such as material strength.
In a one-factor-at-a-time approach, the experimenter varies one factor while keeping the other factors of interest fixed. This is illustrated in figure 1(a) for a design space with two factors. Such an approach requires a large number of experimental runs, but does not give insight into the interactions between the variables or identify the optimum conditions [3].
To mitigate these challenges we consider an approach from the field of statistics, statistical design of experiments (DoE). DoE is the collective name for statistical optimisation techniques which are commonly employed to plan an experimental program with a sufficient number of experiments in order to make sound scientific conclusions from the data [3,4]. It achieves this by optimising the design process, screening for the most influential factors and determining their interactions while establishing and maintaining quality control [3]. The goal of a DoE is therefore to provide a near optimal and efficient design which can be performed with the minimum number of experimental runs.
Bessa et al. [5] demonstrated the power of DoE for machine learning in non-linear material simulation. They developed a numerical database of constitutive response curves for microstructural representative volume elements, using microstructure descriptors, material properties and boundary conditions as input factors for the design. Using a DoE approach ensured that they were efficiently sampling the full non-linear design space to generate training data for their machine learning algorithm, avoiding the incomplete information problem of a one-factor-at-a-time method.
DoE's can be grouped into two design classes, factorial designs and response surface method (RSM) designs.
Factorial designs consider all the possible combinations of factors and levels that are being investigated. As a result these tend to rapidly increase the number of experimental runs required as the number of factors increase. A full factorial design will have n k experiments where n is the number of levels and k is the number of factors. For certain applications, it is possible to do a fractional factorial design which will reduce the number of experimental runs. Factorial designs are generally used when screening the factors of interest. An example Figure 1. Graphical illustration of the (a) one-factor-at-a-time approach, (b) factorial designs for efficient screening and (c) response surface designs for optimisation. The contour colour gradient represents the response variable (yellow = high value and blue = low value). [Redrawn from [6] under the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/)] of factor screening considering a full factorial design is shown in figure 1(b) for two factors and two levels.
RSM designs rely on making an initial assumption about the shape of the response surface. The response surface is the response variable plotted as a function of the factors in the design space. An experiment can be efficiently designed to accurately capture the features of the assumed response surface. RSM is employed after factor screening to optimise the product or system. An example of an RSM design is illustrated in figure 1(c) showing a more focused design region compared to the factorial design. Notice that this is the first method which captures an estimate of the true optimum.
The aim of this study is to showcase and discuss several RSM designs. We will explain the design methods and define the criteria which will be used to compare them. We will do this comparison in the context of a case study from the field of polymer composites. The contribution of this paper is in demonstrating how this statistical technique can improve the quality and reliability of experimental or numerical investigations.

Theory and Background of Response Surface Methods
In this section we first define the three simplest response surface models. Four classic examples of RSM designs are then introduced and explained: Taguchi Design, Central Composite Design (CCD), Box-Behnken Design (BBD) and D-optimal Design. The criteria for evaluating the efficiency of these methods are defined.

Response Surface Models
RSM rely on the assumption of a mathematical relationship between the response variable and the factors. In this section we present the three simplest assumptions: a linear model, a two-order linear interaction model and a quadratic model. These are illustrated in figure 2 for a model with two factors. In each of these models y represents the response variable, x i the factors and β i and β i j are the regression coefficients. The purpose of an experiment is to characterise the response surface by solving for the number of p regression coefficients.
A linear model (also called a linear regression model) for k factors has p = k + 1 model parameters and will take the following form: The two-order interaction model (also called an interaction model) for k factors has p = 1 + (k 2 + k)/2 model parameters: A quadratic model for k factors has p = 1 + (k 2 + 3k)/2 model parameters:

Explanation of Different RSM Design Methods
An overview of four commonly used RSM design methods is given in this section. They are illustrated in figure 3, for three factors, in which case the design space is represented by a cube.

Taguchi Design
The Taguchi design originated as a class of factorial design known as robust parameter design [3,7]. A Taguchi design tests all combinations of factors and levels, similar to a factorial design, but uses pairs of combinations to reduce the number of experiments required [4]. RSM is applied to the Taguchi design to improve the efficiency of the design by mathematically including both the controllable and noise (or uncontrollable) factors in a single response function to account for uncertainty [3]. Thus the effect of two or more levels of the factors on the response variables can be studied [4,8]. The two levels of factors can be seen in figure  3(a) with the inner factorial points denoting the controllable factors and the outer factorial points the noise factors. A disadvantage of Taguchi designs is that they do not account for interaction effects between controllable and noise factors [7].

Central Composite Design and Box-Behnken Design
A CCD is a five level factorial design consisting of factorial points at the edges of the design space, axial points at the midpoint of each level (centers of the faces) and a center point in the middle of the design space. A CCD is illustrated in figure 3(b). The CCD is based on a two level factorial design, 2 k , where k is the number of factors. By adding axial and center points to the design, the CCD requires fewer experimental runs compared to a full factorial design while still producing similar results [4]. Axial points are determined by the distance from the design center, α, which provides a measure of rotatability. The rotatability of a CCD provides an indication of the model's ability to ensure the variance of the predicted response is constant and stable when the design is rotated around the center [3]. This rotatibility measure is dependent on the assumed shape of the response surface. For this study, we consider α = 1 which assumes a cuboidal shape resulting in a face-centered CCD. The axial points are then in the center of the cube faces as shown in figure 3(b). Center points are added to the design to estimate experimental error. They also provide an indication of curvature, that is detecting quadratic effects [7]. A CCD is generally considered to be efficient when fitting a second order model [3,4,8]. An alternative to the CCD is the BBD which is considered to be more economical as it requires fewer experimental runs for the same number of factors [3]. A BBD is based on a 2 k three level factorial design with an incomplete block design [4,7,8]. An incomplete block design considers all combinations for a certain number of factors while keeping the remaining factors at the central levels [3,4,8]. The BBD is illustrated in figure 3(c). In comparison with k. This is similar to where the axial points of a spherical CCD would lie if the design were rotated with radius α = √ k. As with the CCD, center points can also be added to the BBD to estimate experimental error.

D-optimal Design as an example of Optimal Designs
Optimal designs are very flexible and are able to handle different user constraints and more complex experimental design problems, such as: constrained design regions; many factors or levels and limits on the total number of experimental runs [3,7]. Optimal designs are numerically generated designs which aim to minimise a specific optimality criterion to select a near optimal design [3,7]. Optimality criteria minimise some mathematical measure which focus on either the estimation of regression coefficients or the prediction of the response variable in the design region [7]. The computational algorithm first constructs a full factorial design to create a set of candidate points. An initial design is then selected at random from the candidate points and design weights (w i = 1/p) are assigned to each design point. The design is then evaluated based on the response surface model and specified optimality criterion. The optimality criterion is optimised by using a suitable nonlinear optimisation algorithm [3,7]. The choice of optimality criterion determines the type of optimal design. For this study, we will consider a D-optimal design as it is the most widely used optimality criterion [3,7]. The interested reader is referred to Montgomery Chapter 11 [3] or Myers et al. Chapter 9 [7] for the definitions of the other optimality criteria.
A D-optimal design aims to find a good estimation of the regression coefficients subject to the constraints on the design region and experimental runs [7]. The D-optimal design estimates the regression coefficients by maximising the determinant of |(X ′ X)|, where X is the matrix as known as the design matrix. X is populated with all the combinations of the factor levels (i.e., -1, 0, 1 for a three level design) for each factor and has columns expanded for all the terms in the response surface model. X ′ is the transpose of the design matrix. A D-optimal design is illustrated in figure 3(d). The constrained design region is represented by a gray triangle in the top right corner, indicating that no design points can be selected above the triangle. As with the CCD and BBD designs center points can be included for estimation of experimental error. For an optimal design, additional design choices can also be made which include specifying the number of replicates and lack of fit points. Replications (purple 'x' in figure 3(d)) of certain design points can be added if there is a strong interest in a specific combination of factors and levels [3]. On the other hand, if there is a need to test the prediction ability of the chosen response surface model, lack of fit points (green squares in figure 3(d)) are included in the design space [3].
Optimal designs can be classified as either continuous (i.e. approximate) or exact (i.e. discrete) designs [9]. The classification refers to the optimisation approach considered when generating the optimal design. For a continuous design, the aim is to find the minimum number of distinct design points with their assigned design weights (0 ≤ w i ≤ 1) considering the design region [9]. Exact designs specify the number of experimental runs (N) for the optimisation process at the start [9]. The exact design can be approximated by multiplying the design weights w i = r i /N (r i is the number of replication points) with the total number of experiments (N) [9].
Global D-optimal designs are continuous designs, using the full design space, based on the D-optimality criterion. For a given set of factors, levels and response surface, the global D-optimal design is uniquely defined, and captures the full design response. In this study, each of the experimental designs will be compared to the global D-optimal design for the relevant response surface model. We will compare both the required number of experiments and the efficiency (Sect. 2.3) to the global D-optimal design.

Design Evaluation
We measure the performance of a design by evaluating its efficiency. The efficiency is an indication of whether a design will be able to produce reliable results under a variety of circumstances [7]. There are two widely used measures of design efficiency: D-efficiency and G-efficiency [3,7,9].
D-efficiency is a measure of the precision of estimation of the regression coefficients, β i j [7]. We can understand D-efficiency by considering figure 4(a), which illustrates D-efficiency in two dimensions. In this simple example the area of the ellipse represents the confidence region of the regression coefficients. D-efficiency is achieved when the area of this ellipse is minimised [3,7], which is equivalent to maximising the determinant, |(X ′ X)|, where X is the design matrix containing the factors (x i ) for all levels in the response surface model.
When we want to compare two different designs we need to evaluate the relative Defficiency, which compares the D-efficiency of the current design to that of the global D-optimal design (denoted by subscript gopt) by comparing their determinants [3,9]: where p is the number of regression coefficients, and the matrix W is a diagonal matrix containing the design weights attached to each design point. |(X ′ WX)| is the determinant whose inverse provides insight into the variance of the regression coefficients. The G-efficiency of a design is a measure of the accuracy of the prediction of the response variable in the design region [3,7]. The G-efficiency is defined as [3,9]: where p is the number of regression coefficients andx is a vector containing the factor values at which the response variable prediction is made. The denominator is known as the scaled prediction variance (SPV). SPV is the variance of the predicted response at a point in the design region scaled to the number of experimental runs [7]. A simple visualisation of G-efficiency for a linear model with three factors is shown in figure 4(b). Two designs are shown, represented by the red dash-dotted line and the blue dashed line. The y-axis, the SPV, is a measure of how many regression coefficients are required to accurately predict the model response for a given fraction of the design space (FDS), shown on the x-axis. The linear model with three factors has four regression coefficients, i.e. p = 4. This is represented by the black line for a G-efficiency of 100 %. The red design has max(S PV) = 6.5 giving a G-efficiency of 61.5 %, while the blue design has max(S PV) = 5 resulting in a G-efficiency of 80 %. This means that the red design would require more replications at the specified design points to achieve similar prediction variance as the blue design. Thus it has a lower G-efficiency. To improve the G-efficiency of the design we want to minimise the variance in the response variable prediction, i.e. change the design such that at an FDS of 1 the SPV is minimised [3,7].

Case Study
In this section we briefly explain the case study. The response surface models are then defined and the global D-optimal and RSM designs generated. Finally, we compare and evaluate the designs in terms of the relative D-efficiency and the G-efficiency, as well as the number of experiments required.
For this study, we used Design Expert [10] to generate the different RSM designs. To develop the global D-optimal designs and to calculate the D-efficiency and G-efficiency we used R [11] with the AlgDesign Package [12].

Explanation of Case Study
A case study is used to illustrate the differences between the different experimental designs. Ujianto et al. [2] aimed to find the optimum compounding conditions for an HDPE/Cloisite 93A/HDPE-g-MA (93/2/5 wt%) polymer composite using an internal mixer. They investigated the internal mixer temperature, rotor speed and mixing time. They chose the lower and upper boundaries for each factor such that the polymer melts without causing material degradation. This paper has been chosen as the case study as it is well suited for RSM designs, allowing us to illustrate the use of DoE in the design of polymer composite manufacturing methods. The paper considers only quantitative (i.e., numerically measurable) factors which significantly simplifies the design approach.
The three factors and two levels considered are summarised in table 1. The authors used a BBD with three center points, requiring 15 experimental runs. For the response variable analysis they applied a quadratic response surface model.

Response Surface Models and Experimental Designs
The Taguchi, CCD and BBD designs are not dependent on the response surface model chosen. In other words, it is not required for the user to specify a response surface model to generate the design. The global D-optimal and D-optimal designs require the user to define the response surface model and will therefore change depending on the selected model.  Global D-optimal 4 8 21 From equation (1) a linear model for three factors will have the following form: where x 1 , x 2 and x 3 represents the temperature, speed and mixing time, respectively. The relative D-efficiency compares the generated designs to a global D-optimal design as shown in equation (4). The linear model in equation (6) has four unknown regression coefficients to estimate. Therefore, the global D-optimal design is specified as a 2 3−1 fractional factorial design with equal weights assigned to the four design points (w i = 1/4 for an exact design). A fractional factorial design is very efficient as it covers the whole design region and provides precise estimates for the regression coefficients. The two-order interaction model for three factors will have the following form based on equation (2): where x 12 , x 13 and x 23 represent the interactions between the different factors.This model has seven unknown regression coefficients. The global D-optimal design is therefore a 2 3 full factorial design with equal weights assigned to the eight design points i.e., w i = 1/8. A quadratic model for three factors will have the following form based on equation (3): where x 2 1 , x 2 2 and x 2 3 represent the quadratic terms. The quadratic model has 10 unknown parameters to estimate. For the global D-optimal design we need to consider three levels to capture the quadratic effects. The global D-optimal design is a fraction of the 3 k full factorial design. However, the design is made up of three parts with different design weights assigned to each. The center point has a weight equal to 0.066, the 2 3 factorial points have a total weight equal to 0.424, and the remainder of the points have total weight equal to 0.510 (see Atkinson et al. [9], Chapter 11, table 11.6).

Results and Discussion
The relative D-efficiency and G-efficiency for the different designs, including the global Doptimal design, are shown in figure 5, plotted against the number of experiments for all three response surface models. In this case study we considered efficiencies ≥ 90 % to provide an indication of good performance. This threshold is shown with a horizontal line. The vertical line shows the number of experimental runs required by the global D-optimal design. The gray area represents the region for a good design where we have high efficiency with fewer experiments than the global D-optimal design. To differentiate between designs at the same location we have incorporated a slight horizontal perturbation for these data points. For a linear model the global D-optimal design requires 4 experiments. Both the Taguchi and D-optimal designs have a relative D-efficiency and G-efficiency of 100 % and also only require 4 experiments. These designs are therefore most efficient for estimating the linear regression coefficients, and predicting the response variable with minimum variance. They are also equivalent to the global D-optimal design with the exact same number of experiments. BBD has a 100 % G-efficiency, but a less than acceptable relative D-efficiency. It also requires three times the number of experiments compared to the global D-optimal, Taguchi and D-optimal designs. BBD is a less than ideal design to consider, but a viable option if the objective of the study is to predict the response variable. CCD generally provides the worst efficiencies and the highest number of experiments. It is therefore not a viable option when only considering linear effects.
For the two-order interaction model the global D-optimal design requires 8 experiments. It is clear that all the designs are sub-optimal in terms of their efficiencies. However, the D-optimal designs are a viable option as they still have relatively high D-efficiencies (in the 80 % range) requiring fewer experiments. BBD has a 100 % G-efficiency, but a very low relative D-efficiency. As with the linear model, it is an option if response variable prediction is the focus. CCD is not an option to consider for an interaction model as it requires the most experiments and has the worst efficiencies.
To investigate the quadratic effects, the global D-optimal design requires 21 experiments. Some CCD and D-optimal designs require fewer experiments while retaining acceptable efficiency measures.
Adding center points (CP) generally reduces the relative D-efficiency and G-efficiency for all designs and response surface models, while increasing the number of experimental runs by three. However, there is value in adding center points (CP) as it provides stability in the prediction variance and is a measurement of experimental error [7]. When adding replication (Rep) and lack of fit (LOF) points to the D-optimal design the decrease in both relative Defficiency and G-efficiency is still within the acceptable range with only an additional five experimental runs needed. For CCD we note an improvement in relative D-efficiency and Gefficiency when adding factorial (Fac) points but at the cost of a larger number of experimental runs.
The case study design, a BBD with three center points, generally provided the worst relative D-efficiency for all three response surface models. This indicates that it is not able to estimate the regression coefficients precisely. On the other hand, it generally has an acceptable G-efficiency (in the 80 % range). In other words, it is able to predict the response variable with low variance in the design region. The aim of Ujianto et al. [2] was to analyse the effects of the compounding conditions. They would, therefore, have benefited from a design which is able to provide a better estimation of the regression coefficients, such as D-optimal.

Conclusion
In this study we have provided an overview of statistical design of experiments (DoE), comparing a range of different response surface method (RSM) designs -Taguchi Design, Central Composite Design (CCD), Box-Behnken Design (BBD) and D-optimal Design -for linear, two-order interaction and quadratic response surface models. We have explained the concepts of relative D-efficiency and G-efficiency, and evaluated the RSM designs using these measures for a case study considering the manufacturing design of polymer composites.
The D-optimal design was found to be more efficient in estimating the regression coefficients, while minimising the response variable prediction variance across the design region. It is considered to be a good design option for all three response surface models. When only considering main or linear effects of the factors, a Taguchi design is also a good choice. For quadratic effects, CCD is a viable option although it does require more experimental runs than a D-optimal design. Additional points (i.e. center, lack of fit and replication) can be considered, although these negatively impact the designs' relative D-efficiency and G-efficiency. It is up to the investigator to decide how many experimental runs are feasible and whether estimation or prediction is more important.
This paper makes a contribution to the computational and applied mechanics community by clearly showing the benefits of considering an experimental design approach, reducing the number of experimental runs while obtaining statistically relevant results. It has also shown the importance of evaluating different designs to determine which one is best suited for the problem of interest.