We use logistic regression to estimate the probability that an event will occur as a function of other variables. An example is that the probability that a borrower will default as a function of his credit score , income, loan size, and his current debts. We will be discussing classifiers in the next lesson. Logistic […]

### Category: bigdata

## Regression – Relating input variables and outcome

The term “regression” was coined by Francis Galton in the nineteenth century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a phenomenon also known as regression toward the mean). Specifically, regression analysis helps one understand how the value of […]

## Apiriori Alogorithm

Association Rules is another unsupervised learning method. There is no “prediction” performed but is used to discover relationships within the data. The example questions are • Which of my products tend to be purchased together? • What will other people who are like this person or product tend to buy/watch or click on for other […]

## Association Rules

Association Rules is another unsupervised learning method. There is no “prediction” performed but is used to discover relationships within the data. The example questions are • Which of my products tend to be purchased together? • What will other people who are like this person or product tend to buy/watch or click on for other […]

## K-means clustering – Use Cases

K-means clustering is often used as a lead-in to classification. It is primarily an exploratory technique to discover the structure of the data that you might not have notice before and as a prelude to more focused analysis or decision processes. Some examples of the set of measurements based on which clustering can be performed […]

## Hypothesis Testing : ANOVA

ANOVA (Analysis of Variance) is a generalization of the difference of means. Here we have multiple populations, and we want to see if any of the population means are different from the others. That means that the null hypothesis is that ALL the population means are equal. An example: suppose everyone who visits our retail […]

## Hypothesis – Null and Alternative Hypothesis

Here are some examples of null and alternative hypotheses that we would be answering during the analytic lifecycle. Once we have fit a model – does it predict better than always predicting the mean value of the training data? If we call the mean value of the training data “the null model”, then the null […]

## Data Exploration Vs. Presentation

Finally, we want to touch on the difference between using visualization for data exploration, and for presenting results to stakeholders. The plots and tips that we’ve discussed try to make the details of the data as clear as possible for the data scientist to see structure and relationships. These technical graphs don’t always effectively convey […]

## Establishing Multiple Pairwise Relationships between Variables

There are times when it’s useful to see multiple values of a dataset in context in order to visually represent data relationships so as to magnify differences or to show patterns hidden within the data that summary statistics don’t reveal. In the graphic represented above, the variable sepal length, sepal width, petal length and petal […]

## Basic R Operations on Vectors

Recall that a vector is a 1-dimensional array with a single data type (either character or numeric). We can perform several different transforms on a vector: multiplying each value by a scalar, creating a new vector by multiplying one vector by another, etc. We also can transform the contents of a vector by performing a […]