Multiple Regression Analysis
|← Research article Critique||News Report →|
Buy custom Multiple Regression Analysis essay
Multiple regression analysis is used to predict the value of a variable (dependent) using two or more variables (independent variables). Multiple regression analysis is an extension of linear regression analysis that uses one predictor to predict the value of a dependent variable. An example of a linear regression model is Y=b0 + b1X. Where Y is the predicted term while X the independent variable. The variable estimated in the model is usually unknown while the independent variables are given.
Regression analysis can estimate a variable (outcome) as a result of some independent variables. For example, the yield to a wheat farmer in a given year is influenced by the level of rainfall, fertility of the land, quality of seedlings, amount of fertilizers used, temperatures and many other factors such as prevalence of diseases in the period. Multiple regression analysis has an advantage over linear regression analysis in that it enables the study of the influence of a change in independent variables on the dependent variable. Multiple regression is also used to calculate the overall fit of the whole model. Analysis of individual predictor contribution to the variance explained is possible (Cohen, 2003).
In general, the multiple regression equation of an independent variable Y on dependent terms X1, X2, X3…Xk is;
Y = B0 + B1X1 + B2X2 + B3X3 + … + BkXk
Y is the variable predicted in the equation by the individual using regression analysis. Bo is the intercept of the equation while B1, B2, B3… Bk are the slopes just as in the linear regression model. X1, X2, X3…Xk are the decision or predictor variables of the dependent variable Y. B1, B2, B3… Bk gives the change to Y due to a unit change in the predictor variables X1, X2, X3…Xk. For example holding X1, X2, X3…Xk constant, Y is equal to b0 the intercept of the regression equation. On the other hand holding X2, X3…Xk constant, a unit increase in X1 increases Y (the dependent variable) by the value of b1.
In the use of multiple regression, some assumptions are necessary. First, the dependent variable Y should be measured on a continuous scale. Other assumptions include; the model should have two or more independent observations, have independence of observation, homoscedasticity, multicollinearity, residual error terms are normally distributed, and a linear relationship exists between Y and X variables.
A key final assumption in regression analysis is that the explanatory variables are qualitative in nature. Examples of qualitative variables include quantity demanded of a good, money supply, hours spent reading, age, interest rates and many more. However, when using real world data, there are some variables or factors that are not measurable but have an influence on the dependent variable Y. examples of such variables include marital status, political membership, nationality, sex and material status.
Such variables are not measurable, but they have an influence on the dependent variable. We need to include these dummy variables in the multiple regression analysis since they too have an influence on the dependent variable just as the other variables. They are included in the regression analysis through the process of observation, recording and coding. Dummy variables help to take account of qualitative characteristics in a regression model. Dummy variable can only take two values 1 or 0.
Data used for the multiple regression model is obtained from an online reading material. The example contains data on employees’ work score, sex, years of experience and salary increase. A group of female employees has placed a claim with a law firm that salary increases at the firm are discriminatory towards women. To investigate this claim, random employee data is extracted from the firm’s records.
Salary increase data is calculated in extra dollars per month. While the employee’s work score is measured by a quality of work index with a range of 0 to 100. A first look at the data shows that male employees receive a higher salary increases than their female counterparts. However, the high salary increases may be due to other factors such as job performance or experience. In the model investigation is also necessary to find out if salary increases are solely tied to job performance. Therefore, it is necessary to find out if the evaluations are themselves discriminatory.
In the model gender is a dummy variable and is assigned a value of 1 if female and value 0 if the employee is male. In the regression model, we introduce an interaction term. The interaction term shows the interrelationship between the dependent variable and independent variables. For example, the salary increase and work performance depends on an employee’s sex. To find out this claim, an interaction variable which is a multiplication of the gender of an employee and his or her work index.
The interaction term is added to the regression model if it is significant. However, it is dropped if it is turns out to be insignificant. This is if it does not contribute much to the explanatory power of the regression model. Therefore, in the example dropping the interaction term of the model would mean that pay increases to both male and female employees are as a result of a fair decision based on their job performance ratings. On the other hand, if there is a difference in the effects of job performance ratings, then the interaction term will be useful in explaining variations in pay increases.
The hypothesis tested by the model is that a female employee receives smaller raises than their male counterparts. Another hypothesis is that increments in job performance ratings of female employees result in a smaller pay raise as compared to the male employees of the company. A complete OLS estimate of the whole model is given as Y = 59.94 – 29.71X + 4.84Z – 4.05W. The calculated coefficient of determination (R2) of the whole model is 0.941 or 94.1%. Thus considering an equation for male employees only, we substitute W =0 to give the equation Y = 59.94 + 4.84Z. Therefore, male employees expect to receive a salary increase of $59.94 for a unit increase in their job if job performance is zero.
On the other hand, the equation for female employees is Y = 30.23 + 0.79Z. A female employee will receive a salary increase $30.23 when their job performance is zero. For every unit increase in job performance, the female employee gets a pay increase of $0.79. Therefore, from the results discrimination is evident in the pay increase practice of the firm. For example, women get $30 less with a no job performance index. The indices that measure the pay female and male employees get from input at work is different.
The next step is to test the significance of inclusion of a variable in the model. This is done by testing if inclusion of a variable has significant increase in the sum of explained squares. This is achieved by an analysis of the coefficient of determination for the whole model and R2 of models without variables of interest. The R2 without the model of interest is also referred to as the reduced model. To test the statistical significance of the each of the variable the t calculated and the t critical are compared. The t calculated is obtained by taking the regression coefficients divided by their standard error terms.
The R2 for the whole model is 0.941 or 94.1%. This means that variations in salary Increase are 94.1% explained by job performance and gender of the employee. This means that the independent variables explain 94.1% variations in the model. This leaves 5.9% to other factors and to other issues such as sampling errors. The F-static is used to analyse the significance of a variable to the whole model. The F-statistic takes into account the R2 for the complete and reduced model. The F-statistic can also be calculated by taking the residual sum of squares and degrees of freedom of the model.
The F critical is 17.81 at 1% level of significance. Comparing it with the calculated F statistic at 1% level of significance shows that the interaction term in the model is significant. This is because the F calculated is more than the F critical (Kleinbaum, 2008). Therefore, the interaction term adds significantly to the explanatory power of the model. The conclusion is that evaluation of job performance is discriminatory towards women. Therefore, the law firm can obtain evidence and build a successful case against the firm. However, evidence to validate the first claims is not available.
In conclusion, multiple regression model is thus useful in analysing the influence of a change in one of the dependent variables. It is useful to analysts to control multiple factors that simultaneously affect a variable in determination. It is used in the example model to determine if there is gender discrimination in the salary increase of employees.