We use logistic regression to estimate the probability that an event will occur as a function of other variables. An example is that the probability that a borrower will default as a function of his credit score , income, loan size, and his current debts. We will be discussing classifiers in the next lesson. Logistic regression can also be considered a classifier. Recall the discussions on classifiers in lesson 1 of this module(Clustering). Classifiers are methods to assign class labels (default or no_default) based on the highest probability. In logistic regression input variables can be continuous or discrete. The output is a set of coefficients that indicate the relative impact of each of the input variables. In a binary classification case (true/false) the output also provides a linear expression for predicting the log odds ratio of the outcome as a function of drivers. The log odds ratios can be converted to the probability of an outcome and many packages do this conversion in their outputs automatically.
Logistic regression is the preferred method for many binary classification problems Two examples of a binary classification problem are shown in the slide above.
Other examples :
- true/false • approve/deny
- respond to medical treatment/no response
- will purchase from a website/no purchase
- likelihood Spain will win the next World Cup
Categorical values are expanded exactly the way we did in the linear regression. Computing the coefficients is also done as least square method but implemented as iteratively reweighted least squares converging to the true probabilities with every iteration.Logistic regression has exactly the same problems that a OLS method has and the computational complexity increases with more input variables and with categorical values with multiple levels.