Logistic Regression Analysis In Spss
Regression is a statistical tool used to identify relationships and make predictions based on historical data. Critical to the discussion is the fact that several types of regressions such as linear regression, multiple linear regression, logistic regression, and multivariate linear regression exist. It is important to note that all the mentioned regression techniques are based on the ordinary least squares method implying that logistic regression is no different. However, unlike the other regression techniques, logistic regression is used when the outcome variable is categorical. Simply put, the logistic regression technique is used when the response variable is dichotomous, but the predicted variable could be continuous or categorical (Pallant, 2016).It is notable that like other regression techniques, logistic regression has assumptions that must be met before using the test. This paper discusses logistic regression by focusing on its purpose, advantages, disadvantages, requirements, and provides an example of how the test could be used.
The purpose of Logistic Regression
As mentioned, logistic regression is used when the outcome variable is dichotomous. It is important to define the meaning a dichotomous variable because the definition clarifies concepts of the upcoming discussion. It follows that a dichotomous variable is a type of variable that takes on discrete values. Further, as opposed to a categorical variable, a dichotomous variable only has two possible outcomes. Examples of dichotomous variable include pass or fail, male or female, and black or white. Therefore, logistic regression could be used to test factors that cause a machine to pass or fail a quality control test. It could also be used to test the rate and factors that make smokers smoke (based on gender). Apparently, it can be deduced that logistic regression is used to identify significant relationships between categorical dependent variables and continuous or categorical independent variables.
Advantages of Logistic Regression
Logistic regression has several advantages over statistical tools such as linear discriminant analysis. For instance, as opposed to discriminant analysis, the assumption of linearity of the explanatory variables is not a necessity when using logistic regression. It follows that logistic regression works appropriately even when the assumption of normality of the independent variables is absent. In addition, unlike discriminant analysis, logistic regression provides the probability of an outcome. This owes to the reality that the output from logistic regression always includes odds ratios that define the probability of an outcome from a test (Pallant, 2016). It is also important to note that logistic regression has advantages over fishers exact and the chi-square test of independence. Specifically, unlike the latter tests, logistic regression quantifies that strength of a relationship between variables.
Disadvantages of Logistic Regression
The presence of advantages in logistic regression does not imply the absence of disadvantages in logistic regression. For instance, Verma (2013) argues that multicollinearity can cause problems when working with logistic regression. Simply put, working with correlated variables would result in an incredible model that may provide evidence of significant relationships even when the relationship does not exist. Nonetheless, collinearity could be eliminated before running the test by examining for correlations using tests of association. Another disadvantage of logistic regression arises because of the Simpsons paradox. The Simpsons paradox occurs when adding an additional variable to a model reverses the relationship of the initial variables. However, as Chen, Bengtsson, and Ho (2009) reveal the Simpson paradox could be solved using already established methodologies when conducting a research.
Requirements of Logistic Regression
It is notable that four main assumptions must be met before running a logistic regression. To begin with, the assumption of linearity must be met. For linear regression techniques, a requirement of a linear relationship between the outcome variable and predictor variables is a necessity. The same assumption is required in logistic regression, but under a different condition. Specifically, logistic regression requires that the predictor variables have a linear relationship with the logit of the outcome variable. This could be explained the fact that: in logistic regression, the outcome variable is used after taking the logarithm as shown in the equation below. (Pallant, 2016).In short, researchers must ensure that the predictor variables have a linear relationship with the logit of the outcome variable before using logistic regression.
Another requirement for logistic regression is that the predictor variable be dichotomous. This is advisable when modeling data using the binomial logistic regression, but the Poisson regression could be used of the predictor variable is count data. It is notable that the predictor variables should be continuous or categorical for the logistic regression to work appropriately. Further, the predictor variable should not be correlated because correlation would result in multicollinearity. Ultimately, the independence of errors must be ascertained before running a logistic regression. It follows that the independent variable should have mutually exhaustive categories when using logistic regression. For instance, when working on sexual preferences, the logistic regression should be used when the data has heterosexuals and homosexuals, but should not include bisexuals because bisexuals could fall into either group.
Examples of Logistic Regression in SPSS
In order to demonstrate an example, data was obtained from the journal of statistics data archive. Specifically, the lobster survival by size in tethering experiments data was used for the example. The data was meant to describe the survival of lobsters in the marine environment based on their carpace length. The predictor variable was the carpace length while the outcome variable was the ability to survive. The outcome variable was dummy coded in SPSS; where zero represented the death while one represented survival. The data was checked for collinearity using the correlation test. In order to check for collinearity, the researcher clicked on the analyze tab, then correlate, and settled on bivariate regression. The predictor and response variables were inserted in their respective positions and the Pearson correlation coefficient dialogue box was checked. The output from the correlational analysis is depicted in table 1 below.
|Carpace Length in mm||Survive|
|Carpace Length in mm||Pearson Correlation||1||.514**|
|**. Correlation is significant at the 0.01 level (2-tailed).|
Table 1: Correlation output.
From the results in table 1 above, the correlation coefficient between the carpace length and survive is 0.514. This indicates carpace length and survive are not correlated and ensures the absence of collinearity. After confirming the absence of collinearity, the researcher proceeded with logistic regression by clicking on analyze, regression, then binary logistic. The outcome variable (carpace) was entered in the dependent variable box while the predictor variable (survive) was entered in the covariate box. The method used was ‘enter’ owing to the fact that the test had a single predictor variable. The 95 % level of confidence was selected in the option tab. The outputs from logistic regression are depicted in table 2, 3, 4, and 5 below. Note that, the tables below were chosen because they provide interpretation of the model.
|Case Processing Summary|
|Selected Cases||Included in Analysis||159||100.0|
|a. If weight is in effect, see classification table for the total number of cases.|
Table 2: Case-Processing Summary
From table 2 above, it is evident that the entire data set had 159 observations and had no missing values. Further, table three below is an omnibus test, which indicates how well the carpace length predicts the ability of lobsters to survive in the environment of the experiment. From the table, it is evident that the level of significance is 0.000, which is less than 0.05 (Verma, 2013). This implies that carpace length predicts lobster’s survival appropriately.
|Omnibus Tests of Model Coefficients|
Table 3: The Omnibus test
Table 4 and 5 below indicate the percentage of variation that is predicted by the overall model. From the two tables, it could be deduced that the model can predict a reasonable variation of survival of Lobsters (Verma, 2013). This owes to the reality that the Nagelkerke R statistic is 0.345, which implies that 34 .5 percent of the lobster’s survival can be predicted by the variation in carpace length.
|Step||-2 Log likelihood||Cox & Snell R Square||Nagelkerke R Square|
|a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.|
Table 4: Model summary
Table five below tests how well the model produced from the test can predict actual outcomes. It follows that the model can predict 71.3 percent deaths and 70.9 percent survivals. This implies the model is 71 percent correct. Ultimately, table seven indicates the level of significance for the coefficients from the test. From the table, it is evident that both carpace length and the intercept are significant predictors of lobster’s survival because they both have 0.000 levels of significance. Therefore, survival of lobsters in the environment under experiment could be modeled using the equation:
Where p is the probability of survival
|Step 1||Survive||Not survive||57||23||71.3|
|a. The cut value is .500|
Table 5: Classification table
|Variables in the Equation|
|B||S.E.||Wald||df||Sig.||Exp(B)||95% C.I. for EXP(B)|
|a. Variable(s) entered on step 1: CarpaceLengthinmm.|
Table 6: Variables in the equation
In conclusion, this paper discusses logistic regression by focusing on its purpose, advantages, disadvantages, requirements, and provides an example of how the test could be used. For instance, it is clear that logistic regression is used when the outcome variable is dichotomous; but the predictor variables could be continuous or categorical. It is also clear that linearity, independence, multicollinearity, and a dichotomous variable are the main requirements when using logistic regression. Further, the advantages of logistic regression enable the technique to work better then linear discriminant analysis, Fisher’s exact test, and the chi-square test. Critical to the debate is the fact that an example is used to demonstrate how to run and interpret logistic regression in SPSS.
Chan, A., Bengtsson, T., Ho, K. T. 2009. A Regression Paradox for Linear Models: Sufficient
Conditions and Relation to Simpson’s Paradox. American Statistical Association, 63(3): 218-225.
Pallant, J. 2016. Spss survival manual: A step-by-step guide to data analysis using IBM SPSS.
Crows Nest: Allen and Unwin.
Verma, J. P. 2013. Data analysis in management with SPSS software. New Delhi: Springer.