Regression Sum of Squares
We will cover following topics
Introduction
In the realm of regression analysis, understanding the distribution of variation within the dependent variable is fundamental to assessing the effectiveness of a model. The R-squared statistic serves as a pivotal measure in this context, quantifying the proportion of variance in the dependent variable that is explained by the independent variables. In this chapter, we delve into the methodology of calculating the regression R-squared using the three components of the decomposed variation of the dependent variable data: the explained sum of squares (ESS), the total sum of squares (TSS), and the residual sum of squares (RSS). By comprehending the significance of these components, you will be equipped to gauge the predictive power and goodness of fit of a multiple regression model.
Components of Variation: ESS, TSS, and RSS
The decomposition of the variation in the dependent variable, also known as the sum of squares partition, plays a pivotal role in understanding the influence of independent variables on the outcome. Let’s explore the three components:
1) Explained Sum of Squares (ESS): ESS represents the variability in the dependent variable that is accounted for by the regression model. It quantifies the improvement in prediction achieved by using the independent variables in the model. The ESS is calculated as the sum of the squared differences between the predicted values and the mean of the dependent variable. ESS is calculated using the formula given below.
$$ESS=\Sigma(\hat{y}-\bar{y})^2$$
where $\hat{y}$ is the predicted value and $\bar{y}$ is the mean of the dependent variable.
2) Total Sum of Squares (TSS): TSS measures the total variability in the dependent variable without considering any model. It represents the dispersion of the actual data points around the mean of the dependent variable. TSS is calculated using the formula given below.
$$TSS=\Sigma(y-\bar{y})^2$$ where $y$ is the observed value and $\bar{y}$ is the mean of the dependent variable.
3) Residual Sum of Squares (RSS): RSS captures the variability in the dependent variable that remains unexplained by the regression model. It represents the sum of squared differences between the observed values and the predicted values (residuals). RSS is calculated using the formula given below.
$$RSS=\Sigma(y-\hat{y})^2$$ where $y$ is the observed value and $\hat{y}$ is the predicted value.
Calculation of R-squared
The $R^2$ statistic, also known as the coefficient of determination, is defined as the ratio of ESS to TSS. It quantifies the proportion of the total variation in the dependent variable that is explained by the independent variables in the model. $R^2$ is calculated as:
$$$R^2 = \frac{ESS}{TSS}$$
Example: Consider a multiple regression model aimed at predicting the sales revenue of a retail store based on factors like advertising expenditure and location. After performing the regression analysis, you find that ESS = 8000, TSS = 12000, and RSS = 4000. Using the formulas, the $R^2$ can be calculated as $R^2 = \frac{8000}{12000}= 0.67$.
Conclusion
Incorporating the concepts of ESS, TSS, and RSS, the calculation of the regression R-squared serves as a fundamental tool in assessing the efficacy of a multiple regression model. By understanding how these components contribute to the overall variation in the dependent variable, you gain insights into the predictive power of your model. The R-squared statistic provides a clear indicator of how well the model fits the data, aiding in informed decision-making and model refinement.