Introduction to Logistic Regression

Hands On Application: Advanced

Next, run a logistic regression model in SPSS with the bass.sav data. Use YES as the dependent variable and include three independent variables:
Model 1: YES = f(COST, CATCH, INCOME)
Here are some advanced exercises:
  1. Conduct hypothesis tests for groups of coefficients. Run another model adding a "block" of demographic variables: EMPLOYED, EDUCATIO, MARRIED, SEX, and AGE (in the Logistic Regression box, click on "Next" then choose the demographic "covariates"). Is the block of variables statistically significant (look for the "block chi-square" statistic in the output)?

  2. Conduct tests for structural breaks in the data. Do North and South Carolinians behave similarly? Run 3 versions of model 1: NC, SC, and pooled (in the Logistic Regression box, click on "select" then click on NC, as your "selection variable", choose NC=1 as the "rule" and run the logit model; then do the same for NC=0). What is the likelihood ratio test statistic equal to?

  3. Is multicollinearity a problem? Run (1) Model 1 (1) MODEL 1 with EMPLOYED, (2) MODEL 1 with EMPLOYED and without INCOME. What are the effects on the statistical significance of INCOME? What is the correlation between EMPLOYED and INCOME?

  4. Conduct more tests for the appropriate model specification. In Model 1: is there a superior functional form? In the SPSS data window, select COST and "transform" and "compute" COST into a new variable: LNCOST=ln(COST). Select INCOME and "transform" and "compute" INCOME into a new variable: INCOMESQ=income*income. Run the alternative functional form:
    MODEL 2: YES = f(LNCOST, CATCH, INCOME)

    MODEL 3: YES = f(COST, CATCH, INCOME, INCOMESQ)

Finally, if you need to be convinced that the logistic regression model is superior to the linear probability model, here are some things to check:
  1. Test for normality of dependent variable (choose the "skewness" option when you calculated "descriptive statistics," if the t-stat on skewness is greater than 2 then the variable is probably non-normal ...).

  2. Test for heteroskedasticity with the Park test.

  3. Check predicted probabilities from the LP model to determine if they fall outside of the 0, 1 range (save the "unstandardized" predicted value when you run a "regression", "linear" in SPSS).