Introduction to Logistic Regression

Some potential problems and solutions

Logit models are subject to many of the same problems as in multiple regression:

i) Omitted variable(s) can result in bias in the coefficient estimates. To test for omitted variables you can conduct a likelihood ratio test:

LR[q] = {[-2LL(constrained model, i=k-q)] - [-2LL(unconstrained model, i=k)]}
where LR is distributed chi-square with q degrees of freedom, with q = 1 or more omitted variables
This test is conducted automatically by SPSS if you specify "blocks" of independent variables (look for the "block chi-square" in the SPSS output)

ii) The inclusion of irrelevant variable(s) can result in poor model fit. You can consult your Wald statistics or conduct a likelihood ratio test (see above) to search for independent variables with low explanatory power.

iii) Errors in functional form can result in biased coefficient estimates and poor model fit. You should try different functional forms and consult the Wald statistics and model chi-square statistics for overall model fit.

iv) The presence of multicollinearity will not lead to biased coefficients, but the standard errors of the coefficients will be inflated. If a variable which you think should be important (statistically significant) is not, consult the correlation coefficients. Any r(x,y) greater than .4 (.6 - .8 is usually the troublesome range) may be causing the problem.

v) You may have structural breaks in your data. Pooling the data imposes the restriction that an independent variable has the same effect on the dependent variable for different groups of data when the opposite may be true. You can conduct a likelihood ratio test:
LR[i+1] = -2LL(pooled model) - [-2LL(sample 1) + -2LL(sample 2)]
where samples 1 and 2 are pooled, and i is the number of independent variables.