Forward Selection to Find Predictive Variables with Python Code

Fakhredin Khorasani
3 min readSep 20, 2021

Recently, I had a product discovery task to find leverages of revenue increasing. For the beginning, I decided to find the predictive features among all possible ones and writing algorithm code with python. I found step-wise regression method in two ways of backward elimination and forward selection in regression analysis.

In statistics, step-wise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion. [Wikipedia]

I chose forward selection and I want to show you how it works. Before I explain the algorithm, I need to describe R2 and Adjusted R2 metrics.

R-Squared

R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model.

Adjusted R-Squared

Adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases when the new term improves the model more than would be expected by chance.

Algorithm

In forward selection, at the first step we add features one by one, fit regression and calculate adjusted R2 then keep the feature which has the maximum adjusted R2. In the following step we add other features one by one in the candidate set and making new features sets and compare the metric between previous set and all new sets in current step and so on.

In order to simplify the algorithm explanation let me show it with an example. suppose we have 3 features:

y ~ x1 + x2 + x3

Step 1:

y ~ x1 : A-R2= 0.2

y ~ x2 : A-R2= 0.3

y ~ x3: A-R2 = 0.22

{x2} 0.3 > 0.22 {x3} > 0.2 {x1}

candidate {x2}

Step 2:

y ~ x2 + x1 | A-R2=0.28

y ~ x2 + x3 | A-R2=0.32

{x2, x3} 0.32 > 0.3 {x2} > 0.28 {x2, x1}

candidates {x2, x3}

Step 3:

y ~ x2 + x3 + x1 | A-R2=0.31

{x2, x3} 0.32 > 0.31 {x2, x3, x1}

Final candidates {x2, x3}

I use Duncan’s Occupational Prestige Data from CRAN in Python.

Then transform categorical column (type) into 3 dummy variables (bc, prof & wc):

And then run the loop over steps and these are the results of each step:

This means if we added ‘prof’ to the model we couldn’t get a better and more predictive model. So we eliminated that.

Here is the full code on my GitHub:

References:

https://en.wikipedia.org/wiki/Stepwise_regression

https://en.wikipedia.org/wiki/Coefficient_of_determination

https://www.investopedia.com/ask/answers/012615/whats-difference-between-rsquared-and-adjusted-rsquared.asp

--

--

No responses yet