# Binary Logistic Regression in 7 Steps in Excel

from excelmasterseries

The purpose of this example of binary logistic regression is to create an equation that will calculate the probability that a production machine is currently producing output that conforms to desired specifications based upon the age of the machine in months and the average number of shifts that the machine has operated during each week of its lifetime.

Data was collected on 20 similar machines as follows:

1) Whether the machine produces output that meets specifications at least 99 percent of the time.(1 = Machine Meets Spec – It Does Produce Conforming Output at least 99 Percent of the Time, 0 = Machine Does Not Meets Spec – It Does Not Produce Conforming Output at least 99 Percent of the Time)

2) The Machine’s Age in Months

3) The Average Number of Shifts That the Machine Has Operated Each Week During Its Lifetime.

(Click On Image To See a Larger Version)

## Logistic Regression Steps in Excel

### Logistic Regression Step 1 – Sort the Data

The purpose of sorting the data is to make data patterns more evident. Using Excel data sorting tool, perform the primary sort on the dependent variable. In this case, the dependent variable is the response variable indicating whether the prospect made a purchase. Perform subordinate sorts (secondary, tertiary, etc.) on the remaining variables.

The following data was sorted initially according to the response variable (Y). The secondary sort was done according to Machine Age and the tertiary sort was done according to Average Number of Shifts of Operation Per Week. The results are as follows:

(Click On Image To See a Larger Version)

Patterns are evident from the data sort. Machines that did not produce conforming output tended to the older machines and/or machines that operate during a higher average number of shifts per week.

### Logistic Regression Step 2 – Calculate a Logit For Each Data Record

Given the following inputs, X1, X2, …, Xk, the Logit equals the following:

Logit = L = b0 + b1X1 + b2X2 + …+ bkXk

If the explanatory variables are Age and Average Number of Shifts, the Logit, L, is as follows:

Logit = L = b0 + b1*Age + b2*(Average Number of Weekly Shifts)

The Excel Solver will ultimately optimize the variables b0, b1, and b2in order to create an equation that will accurately predict the probability of a machine producing conforming output given the machines age and average number of operating shifts per week.

The Decision Variables are the variables that the Solver adjusts during the optimization process. The Decision Variables b0, b1, and b2 are arbitrarily set to 0.1 before the Solver is run. It is a good idea to initially set the Solver decision variables so that the resulting Logit is well below 20 for each record. Logits that exceed 20 cause extreme values to occur in later steps of logistic regression. The Solver decision variables b0, b1, and b2 have been arbitrarily set to the value of 0.1 to initial produce reasonably small Logits as shown next.

A unique Logit is created for each of the 20 data records based on the initial settings of the Decision Variables as follows:

(Click On Image To See a Larger Version)

### Logistic Regression Step 3 – Calculate eL For Each Data Record

The number e is the base of the natural logarithm. It is approximately equal to 2.71828163 and is the limit of (1 + 1/n)n as n approaches infinity. eL must be calculated for each data record. This step will be shown in the image in the next step, Step 4.

### Logistic Regression Step 4 – Calculate P(X) For Each Data Record

P(X) is the probability of event X occurring. Event X occurs when a machine produces conforming output. P(X) is the probability of a machine producing conforming output.

P(X) = eL / (1 + eL)

L = Logit = b0 + b1*X1 + b2*X2 + …+ bk*Xk

Calculating eL and P(X) for each of the data records is done as follows:

(Click On Image To See a Larger Version)

eL can also be calculated in Excel as exp(L).

### Logistic Regression Step 5 – Calculate LL, the Log-Likelihood Function

The conditional probability Pr(Yi=yi|X1i,X2i,…Xki) is the probability that predicted dependent variable yi equals the actual observed value Yi given the values of the independent variables inputs X1i,X2i,…Xki.

The conditional probability Pr(Yi=yi|X1i,X2i,…Xki) will be abbreviated Pr(Y=y|X) from here forward for convenience.

The conditional probability Pr(Y=y|X) is calculated by the following formula:

Pr(Y=y|X) = P(X)Y * [1-P(X)](1-Y)

Taking the natural log of both sides yields the following:

ln [ Pr(Y=y|X) ] = y*ln [ P(X) ] * (1-y)*ln[ [1-P(X)] ]

The Log-Likelihood Function, LL, is the sum of the ln [ Pr(Y=y|X) ] terms for all data records as per the following formula:

LL = ∑ Yi *P(Xi) + (1 – Yi)(1-P(Xi))

Calculating LL is done as follows:

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

### Logistic Regression Step 6 – Use the Excel Solver to Calculate MLL, the Maximum Log-Likelihood Function

The objective of Logistic Regression is find the coefficients of the Logit (b0 , b1,, b2 + …+ bk) that maximize LL, the Log-Likelihood Function in cell H30, to produce MLL, the Maximum Log-Likelihood Function.

The functionality of the Excel Solver is fairly straightforward: the Excel Solver adjusts the numeric values in specific cells in order to maximize or minimize the value in a single other cell.

The cell that the Solver is attempting to maximize or minimize is called the Solver Objective. This is LL in cell H30.

The cells whose values the Solver adjusts are called the Decision Variables. The Solver Decision Variables are therefore in cells C2, C3, and C4. These contain b0 , b1,, b2 + …+ bk, the coefficients of the Logit. These cells will be adjusted to maximize LL, which is in cell H30.

The Excel Solver is an add-in that in included with most Excel packages. The Solver most be manually activated by the user before it can be utilized for the first time. Different versions of Excel require different method of activation for the Solver. The best advice is to search Microsoft’s documentation online to locate instructions for activating the add-ins that are included with your version of Excel. YouTube videos are often another convenient source for step-by-step instructions for activating Solver in your version of Excel. Once activated, the Solver is normally found in the Data tab of versions of Excel from 2007 onward that use the ribbon navigation. Excel 2003 provides a link to the Solver in the drop-down menu under Tools.

These Decision Variables and Objective are entered into the Solver dialogue box as follows:

(Click On Image To See a Larger Version)

Make sure not to check the checkbox next to Make Unconstrained Variables Non-Negative.

Excel Solver’s GRG Nonlinear Solving Method

The GRG Nonlinear solving method should be selected if any of the equations involving Decision variables or Constraints is nonlinear and smooth (uninterrupted, continuous, i.e., having no breaks). GRG stands for Generalized Reduced Gradient and is a long-time, proven, reliable method for solving nonlinear problems.

The equations on the path to the calculation of the Objective (maximizing LK) involve the calculation of eL, P(X), and Pr(Y=y|X). Each of these three equations is nonlinear and smooth. An equation is “smooth” if that equation and the derivative of that equation have no breaks (are continuous). The GRG Nonlinear solving method should therefore be selected.

One way to determine whether an equation or function is non-smooth (the graph has a sharp point indicating that the derivative is discontinuous) or discontinuous (the equation’s graph abruptly changes values at certain points – the graph is disconnected at these points) is to graph the equation over its expected range of values.

The Solver Should Be Run Through Several Trials To Ensure an Optimal Solution

When the Solver runs the GRG algorithm, it picks a starting point for its calculations. Each time the Solver GRG algorithm is run, it picks a slightly different starting point. This is why different answers will often appear after each run of the GRG Nonlinear solving method. The Solver should be re-run several times until the Objective (LK) is not maximized further. This should produce the best locally optimal values of the Decision Variables (b0, b1, b2, …, bk).

The GRG Nonlinear solving method is guaranteed to produce locally optimal solutions but not globally optimal solutions. The GRG nonlinear solving method will produce a Globally Optimal solution if all functions in the path to the Objective and all Constraints are convex. If any of the functions or Constraints is non-convex, the GRG Nonlinear solving method may find only Locally Optimal Solutions.

A function is convex if it has only one peak either up or down. A convex function can always be solved to a Globally Optimal solution. A function is non-convex if it has more than one peak or is discontinuous. Non-convex solutions can often be solved only to Locally Optimal solutions.

A Globally Optimal solution is the best possible solution that meets all Constraints. A Globally Optimal solution might be comparable to Mount Everest since Mount Everest is the highest of all mountains.

A Locally Optimal solution is the best nearby solution that meets all Constraints. It may not be the best overall solution, but it is the best nearby solution. A Locally Optimal solution might be comparable to Mount McKinley, which is the highest mountain in North America not the highest of all mountains.

The function eL with L = b0 + b1*X1 + b2*X2 + …+ bk*Xk can be non-convex because inputs X1 , X2 ,…, Xk can be nonlinear. The GRG Nonlinear solving method is therefore only guaranteed to find a Locally Optimal Solution.

How to Increase the Chance That the Solver Will Find a Globally Optimal Solution

There are three ways to increase the chance that the Solver will arrive at a Globally Optimal solution:

The first is to run the Solver multiple times using different sets of values for the Decision Variables. This option allows you to select initial sets of Decision Variables based on your understanding of the overall problem and is often the best way to arrive at the most desirable solution.

The second was is to select “Use Multistart.” This runs the GRG Solver for a number of times and randomly selects a different set of initial values for the Decision Variables during each run. The Solver then presents the best of all of the Locally Optimal solutions that it has found.

The third way is to set constraints in the Solver dialogue box that will force the Solver to try a new set of values. Constraints are limitations manually placed on the Decision Variables. Constraints can be useful if the Decision variables should be limited to a specific range of values. A Globally Optimal solution will not likely be found by applying constraints but a more realistic solution can be obtained by limiting Decision Variables to likely values.

### Interpreting Excel Solver Results

Running the Solver produces the following results for this problem:

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

MLL, the Maximum Log-Likelihood was calculated to be -6.654560484 when the constants were adjusted as Solver Decision Variables to the values of:

b0 = 12.48285608

b1 = -0.117031374

b2 = -1.469140055

### Logistic Regression Step 7 – Test the Solver Output By Running Scenarios

Validate the output by running several scenarios through the Solver results. Each scenario will employ a different variation of input variables X1, X2, .. , Xk to produce outputs that should be consistent with the initial data set.

The sort of the initial data showed a pattern that nonconforming product was more likely on older machines and/or machines that were run more often.

The following three scenarios were run as follows:

Scenario 1

Machine Age = 40 months

Average Number of Weekly Shifts = 7

P(X) = Probability of Conforming Output = 8 percent

(Click On Image To See a Larger Version)

Scenario 2

Machine Age = 40 months

Average Number of Weekly Shifts = 4

P(X) = Probability of Conforming Output = 87 percent

(Click On Image To See a Larger Version)

Scenario 3

Machine Age = 12 months

Average Number of Weekly Shifts = 7

P(X) = Probability of Conforming Output = 69 percent

(Click On Image To See a Larger Version)

The outcomes of these three scenarios are consistent with the patterns apparent in the initial sorted data set below that nonconforming product was more likely to be produced by older machines and/or machines that were run more often:

(Click On Image To See a Larger Version)