7 Types of Regression Techniques you should know!

時間 2019-12-09

標籤 types regression techniques know 简体版

原文原文鏈接

翻譯來自：http://news.csdn.net/article_preview.html?preview=1&reload=1&arcid=2825492html

摘要：本文解釋了迴歸分析及其優點，重點總結了應該掌握的線性迴歸、邏輯迴歸、多項式迴歸、逐步迴歸、嶺迴歸、套索迴歸、ElasticNet迴歸等七種最經常使用的迴歸技術及其關鍵要素，最後介紹了選擇正確的迴歸模型的關鍵因素。git

【編者按】迴歸分析是建模和分析數據的重要工具。本文解釋了迴歸分析的內涵及其優點，重點總結了應該掌握的線性迴歸、邏輯迴歸、多項式迴歸、逐步迴歸、嶺迴歸、套索迴歸、ElasticNet迴歸等七種最經常使用的迴歸技術及其關鍵要素，最後介紹了選擇正確的迴歸模型的關鍵因素。app

什麼是迴歸分析？

迴歸分析是一種預測性的建模技術，它研究的是因變量（目標）和自變量（預測器）之間的關係。這種技術一般用於預測分析，時間序列模型以及發現變量之間的因果關係。例如，司機的魯莽駕駛與道路交通事故數量之間的關係，最好的研究方法就是迴歸。less

迴歸分析是建模和分析數據的重要工具。在這裏，咱們使用曲線/線來擬合這些數據點，在這種方式下，從曲線或線到數據點的距離差別最小。我會在接下來的部分詳細解釋這一點。dom

咱們爲何使用迴歸分析？

如上所述，迴歸分析估計了兩個或多個變量之間的關係。下面，讓咱們舉一個簡單的例子來理解它：ide

好比說，在當前的經濟條件下，你要估計一家公司的銷售額增加狀況。如今，你有公司最新的數據，這些數據顯示出銷售額增加大約是經濟增加的2.5倍。那麼使用迴歸分析，咱們就能夠根據當前和過去的信息來預測將來公司的銷售狀況。函數

使用迴歸分析的好處良多。具體以下：工具

它代表自變量和因變量之間的顯著關係；
它代表多個自變量對一個因變量的影響強度。

迴歸分析也容許咱們去比較那些衡量不一樣尺度的變量之間的相互影響，如價格變更與促銷活動數量之間聯繫。這些有利於幫助市場研究人員，數據分析人員以及數據科學家排除並估計出一組最佳的變量，用來構建預測模型。性能

咱們有多少種迴歸技術？

有各類各樣的迴歸技術用於預測。這些技術主要有三個度量（自變量的個數，因變量的類型以及迴歸線的形狀）。咱們將在下面的部分詳細討論它們。學習

對於那些有創意的人，若是你以爲有必要使用上面這些參數的一個組合，你甚至能夠創造出一個沒有被使用過的迴歸模型。但在你開始以前，先了解以下最經常使用的迴歸方法：

1. Linear Regression線性迴歸

它是最爲人熟知的建模技術之一。線性迴歸一般是人們在學習預測模型時首選的技術之一。在這種技術中，因變量是連續的，自變量能夠是連續的也能夠是離散的，迴歸線的性質是線性的。

線性迴歸使用最佳的擬合直線（也就是迴歸線）在因變量（Y）和一個或多個自變量（X）之間創建一種關係。

用一個方程式來表示它，即Y=a+b*X + e，其中a表示截距，b表示直線的斜率，e是偏差項。這個方程能夠根據給定的預測變量（s）來預測目標變量的值。

一元線性迴歸和多元線性迴歸的區別在於，多元線性迴歸有（>1）個自變量，而一元線性迴歸一般只有1個自變量。如今的問題是「咱們如何獲得一個最佳的擬合線呢？」。

如何得到最佳擬合線（a和b的值）？

這個問題可使用最小二乘法輕鬆地完成。最小二乘法也是用於擬合迴歸線最經常使用的方法。對於觀測數據，它經過最小化每一個數據點到線的垂直誤差平方和來計算最佳擬合線。由於在相加時，誤差先平方，因此正值和負值沒有抵消。

咱們可使用R-square指標來評估模型性能。想了解這些指標的詳細信息，能夠閱讀：模型性能指標Part 1,Part 2 .

要點：

自變量與因變量之間必須有線性關係
多元迴歸存在多重共線性，自相關性和異方差性。
線性迴歸對異常值很是敏感。它會嚴重影響迴歸線，最終影響預測值。
多重共線性會增長係數估計值的方差，使得在模型輕微變化下，估計很是敏感。結果就是係數估計值不穩定
在多個自變量的狀況下，咱們可使用向前選擇法，向後剔除法和逐步篩選法來選擇最重要的自變量。

2.Logistic Regression邏輯迴歸

邏輯迴歸是用來計算「事件=Success」和「事件=Failure」的機率。當因變量的類型屬於二元（1 / 0，真/假，是/否）變量時，咱們就應該使用邏輯迴歸。這裏，Y的值從0到1，它能夠用下方程表示。

odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk

上述式子中，p表述具備某個特徵的機率。你應該會問這樣一個問題：「咱們爲何要在公式中使用對數log呢？」。

由於在這裏咱們使用的是的二項分佈（因變量），咱們須要選擇一個對於這個分佈最佳的連結函數。它就是Logit函數。在上述方程中，經過觀測樣本的極大似然估計值來選擇參數，而不是最小化平方和偏差（如在普通迴歸使用的）。

要點：

它普遍的用於分類問題。
邏輯迴歸不要求自變量和因變量是線性關係。它能夠處理各類類型的關係，由於它對預測的相對風險指數OR使用了一個非線性的log轉換。
爲了不過擬合和欠擬合，咱們應該包括全部重要的變量。有一個很好的方法來確保這種狀況，就是使用逐步篩選方法來估計邏輯迴歸。
它須要大的樣本量，由於在樣本數量較少的狀況下，極大似然估計的效果比普通的最小二乘法差。
自變量不該該相互關聯的，即不具備多重共線性。然而，在分析和建模中，咱們能夠選擇包含分類變量相互做用的影響。
若是因變量的值是定序變量，則稱它爲序邏輯迴歸。
若是因變量是多類的話，則稱它爲多元邏輯迴歸。

3. Polynomial Regression多項式迴歸

對於一個迴歸方程，若是自變量的指數大於1，那麼它就是多項式迴歸方程。以下方程所示：

y=a+b*x^2

在這種迴歸技術中，最佳擬合線不是直線。而是一個用於擬合數據點的曲線。

重點：

雖然會有一個誘導能夠擬合一個高次多項式並獲得較低的錯誤，但這可能會致使過擬合。你須要常常畫出關係圖來查看擬合狀況，而且專一於保證擬合合理，既沒有過擬合又沒有欠擬合。下面是一個圖例，能夠幫助理解：

明顯地向兩端尋找曲線點，看看這些形狀和趨勢是否有意義。更高次的多項式最後可能產生怪異的推斷結果。

4. Stepwise Regression逐步迴歸

在處理多個自變量時，咱們可使用這種形式的迴歸。在這種技術中，自變量的選擇是在一個自動的過程當中完成的，其中包括非人爲操做。

這一壯舉是經過觀察統計的值，如R-square，t-stats和AIC指標，來識別重要的變量。逐步迴歸經過同時添加/刪除基於指定標準的協變量來擬合模型。下面列出了一些最經常使用的逐步迴歸方法：

標準逐步迴歸法作兩件事情。即增長和刪除每一個步驟所需的預測。
向前選擇法從模型中最顯著的預測開始，而後爲每一步添加變量。
向後剔除法與模型的全部預測同時開始，而後在每一步消除最小顯着性的變量。

這種建模技術的目的是使用最少的預測變量數來最大化預測能力。這也是處理高維數據集的方法之一。

5. Ridge Regression嶺迴歸

嶺迴歸分析是一種用於存在多重共線性（自變量高度相關）數據的技術。在多重共線性狀況下，儘管最小二乘法（OLS）對每一個變量很公平，但它們的差別很大，使得觀測值偏移並遠離真實值。嶺迴歸經過給迴歸估計上增長一個誤差度，來下降標準偏差。

上面，咱們看到了線性迴歸方程。還記得嗎？它能夠表示爲：

y=a+ b*x

這個方程也有一個偏差項。完整的方程是：

y=a+b*x+e (error term),  [error term is the value needed to correct for a prediction error between the observed and predicted value]

=> y=a+y= a+ b1x1+ b2x2+....+e, for multiple independent variables.

在一個線性方程中，預測偏差能夠分解爲2個子份量。一個是誤差，一個是方差。預測錯誤可能會由這兩個份量或者這兩個中的任何一個形成。在這裏，咱們將討論由方差所形成的有關偏差。

嶺迴歸經過收縮參數λ（lambda）解決多重共線性問題。看下面的公式

在這個公式中，有兩個組成部分。第一個是最小二乘項，另外一個是β2（β-平方）的λ倍，其中β是相關係數。爲了收縮參數把它添加到最小二乘項中以獲得一個很是低的方差。

要點：

除常數項之外，這種迴歸的假設與最小二乘迴歸相似；
它收縮了相關係數的值，但沒有達到零，這代表它沒有特徵選擇功能
這是一個正則化方法，而且使用的是L2正則化。

6. Lasso Regression套索迴歸

它相似於嶺迴歸，Lasso （Least Absolute Shrinkage and Selection Operator）也會懲罰迴歸係數的絕對值大小。此外，它可以減小變化程度並提升線性迴歸模型的精度。看看下面的公式：

Lasso 迴歸與Ridge迴歸有一點不一樣，它使用的懲罰函數是絕對值，而不是平方。這致使懲罰（或等於約束估計的絕對值之和）值使一些參數估計結果等於零。使用懲罰值越大，進一步估計會使得縮小值趨近於零。這將致使咱們要從給定的n個變量中選擇變量。

要點：

除常數項之外，這種迴歸的假設與最小二乘迴歸相似；
它收縮係數接近零（等於零），這確實有助於特徵選擇；
這是一個正則化方法，使用的是L1正則化；

· 若是預測的一組變量是高度相關的，Lasso 會選出其中一個變量而且將其它的收縮爲零。

7.ElasticNet迴歸

ElasticNet是Lasso和Ridge迴歸技術的混合體。它使用L1來訓練而且L2優先做爲正則化矩陣。當有多個相關的特徵時，ElasticNet是頗有用的。Lasso 會隨機挑選他們其中的一個，而ElasticNet則會選擇兩個。

Lasso和Ridge之間的實際的優勢是，它容許ElasticNet繼承循環狀態下Ridge的一些穩定性。

要點：

在高度相關變量的狀況下，它會產生羣體效應；
選擇變量的數目沒有限制；
它能夠承受雙重收縮。

除了這7個最經常使用的迴歸技術，你也能夠看看其餘模型，如Bayesian、Ecological和Robust迴歸。

如何正確選擇迴歸模型？

當你只知道一個或兩個技術時，生活每每很簡單。我知道的一個培訓機構告訴他們的學生，若是結果是連續的，就使用線性迴歸。若是是二元的，就使用邏輯迴歸！然而，在咱們的處理中，可選擇的越多，選擇正確的一個就越難。相似的狀況下也發生在迴歸模型中。

在多類迴歸模型中，基於自變量和因變量的類型，數據的維數以及數據的其它基本特徵的狀況下，選擇最合適的技術很是重要。如下是你要選擇正確的迴歸模型的關鍵因素：

數據探索是構建預測模型的必然組成部分。在選擇合適的模型時，好比識別變量的關係和影響時，它應該首選的一步。
比較適合於不一樣模型的優勢，咱們能夠分析不一樣的指標參數，如統計意義的參數，R-square，Adjusted R-square，AIC，BIC以及偏差項，另外一個是Mallows' Cp準則。這個主要是經過將模型與全部可能的子模型進行對比（或謹慎選擇他們），檢查在你的模型中可能出現的誤差。
交叉驗證是評估預測模型最好額方法。在這裏，將你的數據集分紅兩份（一份作訓練和一份作驗證）。使用觀測值和預測值之間的一個簡單均方差來衡量你的預測精度。
若是你的數據集是多個混合變量，那麼你就不該該選擇自動模型選擇方法，由於你應該不想在同一時間把全部變量放在同一個模型中。
它也將取決於你的目的。可能會出現這樣的狀況，一個不太強大的模型與具備高度統計學意義的模型相比，更易於實現。
迴歸正則化方法（Lasso，Ridge和ElasticNet）在高維和數據集變量之間多重共線性狀況下運行良好。

/*******************************************************************************************/

Introduction

Linear and Logistic regressions are usually the first algorithms people learn in predictive modeling. Due to their popularity, a lot of analysts even end up thinking that they are the only form of regressions. The ones who are slightly more involved think that they are the most important amongst all forms of regression analysis.

The truth is that there are innumerable forms of regressions, which can be performed. Each form has its own importance and a specific condition where they are best suited to apply. In this article, I have explained the most commonly used 7 forms of regressions in a simple manner. Through this article, I also hope that people develop an idea of the breadth of regressions, instead of just applying linear / logistic regression to every problem they come

across and hoping that they would just fit!

What is Regression Analysis?
Why do we use Regression Analysis?
What are the types of Regressions?
- Linear Regression
- Logistic Regression
- Polynomial Regression
- Stepwise Regression
- Ridge Regression
- Lasso Regression
- ElasticNet Regression
How to select the right Regression Model?

What is Regression Analysis?

Regression analysis is a form of predictive modelling technique which investigates the relationship between adependent (target) and independent variable (s) (predictor). This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables. For example, relationship between rash driving and number of road accidents by a driver is best studied through regression.

Regression analysis is an important tool for modelling and analyzing data. Here, we fit a curve / line to the data points, in such a manner that the differences between the distances of data points from the curve or line is minimized. I’ll explain this in more details in coming sections.

Why do we use Regression Analysis?

As mentioned above, regression analysis estimates the relationship between two or more variables. Let’s understand this with an easy example:

Let’s say, you want to estimate growth in sales of a company based on current economic conditions. You have the recent company data which indicates that the growth in sales is around two and a half times the growth in the economy. Using this insight, we can predict future sales of the company based on current & past information.

There are multiple benefits of using regression analysis. They are as follows:

It indicates the significant relationships between dependent variable and independent variable.
It indicates the strength of impact of multiple independent variables on a dependent variable.

Regression analysis also allows us to compare the effects of variables measured on different scales, such as the effect of price changes and the number of promotional activities. These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the best set of variables to be used for building predictive models.

How many types of regression techniques do we have?

There are various kinds of regression techniques available to make predictions. These techniques are mostly driven by three metrics (number of independent variables, type of dependent variables and shape of regression line). We’ll discuss them in detail in the following sections.

For the creative ones, you can even cook up new regressions, if you feel the need to use a combination of the parameters above, which people haven’t used before. But before you start that, let us understand the most commonly used regressions:

1. Linear Regression

It is one of the most widely known modeling technique. Linear regression is usually among the first few topics which people pick while learning predictive modeling. In this technique, the dependent variable is continuous, independent variable(s) can be continuous or discrete, and nature of regression line is linear.

Linear Regression establishes a relationship between dependent variable (Y) and one or moreindependent variables (X) using a best fit straight line (also known as regression line).

It is represented by an equation Y=a+b*X + e, where a is intercept, b is slope of the line and e is error term. This equation can be used to predict the value of target variable based on given predictor variable(s).

The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable. Now, the question is 「How do we obtain best fit line?」.

How to obtain best fit line (Value of a and b)?

This task can be easily accomplished by Least Square Method. It is the most common method used for fitting a regression line. It calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line. Because the deviations are first squared, when added, there is no cancelling out between positive and negative values.

We can evaluate the model performance using the metric R-square. To know more details about these metrics, you can read: Model Performance metrics Part 1, Part 2 .

Important Points:

There must be linear relationship between independent and dependent variables
Multiple regression suffers from multicollinearity, autocorrelation, heteroskedasticity.
Linear Regression is very sensitive to Outliers. It can terribly affect the regression line and eventually the forecasted values.
Multicollinearity can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. The result is that the coefficient estimates are unstable
In case of multiple independent variables, we can go with forward selection, backward elimination and step wise approach for selection of most significant independent variables.

2. Logistic Regression

Logistic regression is used to find the probability of event=Success and event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges from 0 to 1 and it can represented by following equation.

odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk

Above, p is the probability of presence of the characteristic of interest. A question that you should ask here is 「why have we used log in the equation?」.

Since we are working here with a binomial distribution (dependent variable), we need to choose a link function which is best suited for this distribution. And, it is logit function. In the equation above, the parameters are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors (like in ordinary regression).

Important Points:

It is widely used for classification problems
Logistic regression doesn’t require linear relationship between dependent and independent variables. It can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio
To avoid over fitting and under fitting, we should include all significant variables. A good approach to ensure this practice is to use a step wise method to estimate the logistic regression
It requires large sample sizes because maximum likelihood estimates are less powerful at low sample sizes than ordinary least square
The independent variables should not be correlated with each other i.e. no multi collinearity. However, we have the options to include interaction effects of categorical variables in the analysis and in the model.
If the values of dependent variable is ordinal, then it is called as Ordinal logistic regression
If dependent variable is multi class then it is known as Multinomial Logistic regression.

3. Polynomial Regression

A regression equation is a polynomial regression equation if the power of independent variable is more than 1. The equation below represents a polynomial equation:

y=a+b*x^2

In this regression technique, the best fit line is not a straight line. It is rather a curve that fits into the data points.

Important Points:

While there might be a temptation to fit a higher degree polynomial to get lower error, this can result in over-fitting. Always plot the relationships to see the fit and focus on making sure that the curve fits the nature of the problem. Here is an example of how plotting can help:

Especially look out for curve towards the ends and see whether those shapes and trends make sense. Higher polynomials can end up producing wierd results on extrapolation.

4. Stepwise Regression

This form of regression is used when we deal with multiple independent variables. In this technique, the selection of independent variables is done with the help of an automatic process, which involves no human intervention.

This feat is achieved by observing statistical values like R-square, t-stats and AIC metric to discern significant variables. Stepwise regression basically fits the regression model by adding/dropping co-variates one at a time based on a specified criterion. Some of the most commonly used Stepwise regression methods are listed below:

Standard stepwise regression does two things. It adds and removes predictors as needed for each step.
Forward selection starts with most significant predictor in the model and adds variable for each step.
Backward elimination starts with all predictors in the model and removes the least significant variable for each step.

The aim of this modeling technique is to maximize the prediction power with minimum number of predictor variables. It is one of the method to handle higher dimensionality of data set.

5. Ridge Regression

Ridge Regression is a technique used when the data suffers from multicollinearity ( independent variables are highly correlated). In multicollinearity, even though the least squares estimates (OLS) are unbiased, their variances are large which deviates the observed value far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.

Above, we saw the equation for linear regression. Remember? It can be represented as:

y=a+ b*x

This equation also has an error term. The complete equation becomes:

y=a+b*x+e (error term),  [error term is the value needed to correct for a prediction error between the observed and predicted value]

=> y=a+y= a+ b1x1+ b2x2+....+e, for multiple independent variables.

In a linear equation, prediction errors can be decomposed into two sub components. First is due to the biased and second is due to the variance. Prediction error can occur due to any one of these two or both components. Here, we’ll discuss about the error caused due to variance.

Ridge regression solves the multicollinearity problem through shrinkage parameter λ (lambda). Look at the equation below.

In this equation, we have two components. First one is least square term and other one is lambda of the summation of β2 (beta- square) where β is the coefficient. This is added to least square term in order to shrink the parameter to have a very low variance.

Important Points:

The assumptions of this regression is same as least squared regression except normality is not to be assumed
It shrinks the value of coefficients but doesn’t reaches zero, which suggests no feature selection feature
This is a regularization method and uses l2 regularization.

6. Lasso Regression

Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients. In addition, it is capable of reducing the variability and improving the accuracy of linear regression models. Look at the equation below: Lasso regression differs from ridge regression in a way that it uses absolute values in the penalty function, instead of squares. This leads to penalizing (or equivalently constraining the sum of the absolute values of the estimates) values which causes some of the parameter estimates to turn out exactly zero. Larger the penalty applied, further the estimates get shrunk towards absolute zero. This results to variable selection out of given n variables.

Important Points:

The assumptions of this regression is same as least squared regression except normality is not to be assumed
It shrinks coefficients to zero (exactly zero), which certainly helps in feature selection
This is a regularization method and uses l1 regularization
If group of predictors are highly correlated, lasso picks only one of them and shrinks the others to zero

7. ElasticNet Regression

ElasticNet is hybrid of Lasso and Ridge Regression techniques. It is trained with L1 and L2 prior as regularizer. Elastic-net is useful when there are multiple features which are correlated. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.

A practical advantage of trading-off between Lasso and Ridge is that, it allows Elastic-Net to inherit some of Ridge’s stability under rotation.

Important Points:

It encourages group effect in case of highly correlated variables
There are no limitations on the number of selected variables
It can suffer with double shrinkage

Beyond these 7 most commonly used regression techniques, you can also look at other models like Bayesian,Ecological and Robust regression.

How to select the right regression model?

Life is usually simple, when you know only one or two techniques. One of the training institutes I know of tells their students – if the outcome is continuous – apply linear regression. If it is binary – use logistic regression! However, higher the number of options available at our disposal, more difficult it becomes to choose the right one. A similar case happens with regression models.

Within multiple types of regression models, it is important to choose the best suited technique based on type of independent and dependent variables, dimensionality in the data and other essential characteristics of the data. Below are the key factors that you should practice to select the right regression model:

Data exploration is an inevitable part of building predictive model. It should be you first step before selecting the right model like identify the relationship and impact of variables
To compare the goodness of fit for different models, we can analyse different metrics like statistical significance of parameters, R-square, Adjusted r-square, AIC, BIC and error term. Another one is theMallow’s Cp criterion. This essentially checks for possible bias in your model, by comparing the model with all possible submodels (or a careful selection of them).
Cross-validation is the best way to evaluate models used for prediction. Here you divide your data set into two group (train and validate). A simple mean squared difference between the observed and predicted values give you a measure for the prediction accuracy.
If your data set has multiple confounding variables, you should not choose automatic model selection method because you do not want to put these in a model at the same time.
It’ll also depend on your objective. It can occur that a less powerful model is easy to implement as compared to a highly statistically significant model.
Regression regularization methods(Lasso, Ridge and ElasticNet) works well in case of high dimensionality and multicollinearity among the variables in the data set.

End Note

By now, I hope you would have got an overview of regression. These regression techniques should be applied considering the conditions of data. One of the best trick to find out which technique to use, is by checking the family of variables i.e. discrete or continuous.

In this article, I discussed about 7 types of regression and some key facts associated with each technique. As somebody who’s new in this industry, I’d advise you to learn these techniques and later implement them in your models.

From: http://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/