We use logistic regression to estimate the probability that an event will occur as a function of other variables. An example is that the probability that a borrower will default as a function of his credit score , income, loan size, and his current debts. We will be discussing classifiers in the next lesson. Logistic regression can also be considered

# Category: bigdata

## Regression – Relating input variables and outcome

The term “regression” was coined by Francis Galton in the nineteenth century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a phenomenon also known as regression toward the mean). Specifically, regression analysis helps one understand how the value of the dependent variable (also referred

## Apiriori Alogorithm

Association Rules is another unsupervised learning method. There is no “prediction” performed but is used to discover relationships within the data. The example questions are • Which of my products tend to be purchased together? • What will other people who are like this person or product tend to buy/watch or click on for other products we may have to

## Association Rules

Association Rules is another unsupervised learning method. There is no “prediction” performed but is used to discover relationships within the data. The example questions are • Which of my products tend to be purchased together? • What will other people who are like this person or product tend to buy/watch or click on for other products we may have to

## K-means clustering – Use Cases

K-means clustering is often used as a lead-in to classification. It is primarily an exploratory technique to discover the structure of the data that you might not have notice before and as a prelude to more focused analysis or decision processes. Some examples of the set of measurements based on which clustering can be performed are detailed in the slide.

## Clustering

In machine learning, “unsupervised” refers to the problem of finding a hidden structure within unlabeled data. In this lesson and the following lesson we will be discussing two unsupervised learning methods clustering and Association Rules. Clustering is a popular method used to form homogenous groups within a data set based on their internal structure. Clustering is a method often used

## Hypothesis Testing : ANOVA

ANOVA (Analysis of Variance) is a generalization of the difference of means. Here we have multiple populations, and we want to see if any of the population means are different from the others. That means that the null hypothesis is that ALL the population means are equal. An example: suppose everyone who visits our retail website either gets one of

## Hypothesis – Null and Alternative Hypothesis

Here are some examples of null and alternative hypotheses that we would be answering during the analytic lifecycle. Once we have fit a model – does it predict better than always predicting the mean value of the training data? If we call the mean value of the training data “the null model”, then the null hypothesis is that the average

## Establishing Multiple Pairwise Relationships between Variables

There are times when it’s useful to see multiple values of a dataset in context in order to visually represent data relationships so as to magnify differences or to show patterns hidden within the data that summary statistics don’t reveal. In the graphic represented above, the variable sepal length, sepal width, petal length and petal width are compared with three

## Basic R Operations on Vectors

Recall that a vector is a 1-dimensional array with a single data type (either character or numeric). We can perform several different transforms on a vector: multiplying each value by a scalar, creating a new vector by multiplying one vector by another, etc. We also can transform the contents of a vector by performing a transform on each element. If