Apiriori Alogorithm

Apiriori Algorithm

Association Rules is another unsupervised learning method. There is no “prediction” performed but is used to discover relationships within the data. The example questions are • Which of my products tend to be purchased together? • What will other people who are like this person or product tend to buy/watch or click on for other products we may have to offer? In the online retailer example we analyzed in the previous lesson, we could use association rules to discover what products are purchased together within the group that yielded maximum LTV. For example if we set up the data appropriately, we could explore to further discover which products people in GP4 tend to buy together and derive any logical reasons for high rate of returns. We can discover the profile of purchases for people in different groups (Ex: people who buy high heel shoes and expensive purses tend to be in GP4 or people who buy walking shoes and camping gear tend to be in GP2 etc). The goal with Association rules is to discover “interesting” relationships among the variables and the definition of “interesting” depends on the algorithm used for the discovery. The rules you discover are of the form that when I observe X I also tend to observe Y. An example of “interesting” relationships are those rules identified with a measure of “confidence” (with a value >= a pre-defined threshold) with which a rule can be stated based on the data.

Apriori is a bottom-up approach where we start with all the frequent itemsets of size 1 (for example shoes, purses, hats etc) first and determine the support. Then we start pairing them. We find the support for say {shoes,purses} or {shoes,hats} or {purses,hats}. Suppose we set our threshold as 50% we find those itemsets that appear in 50% of all transactions. We scan all the itemsets and “prune away” the itemsets that have less than 50% support (appear in less than 50% of the transactions), and keep the ones that have sufficient support. The word “prune” is used like it would be in gardening, where you prune away the excess branches of your bushes. Apriori property provides the basis to prune over the transactions (search space) and to stop searching further if the support threshold criterion is not met. If the support criterion is met we grow the itemset and repeat the process until we have the specified number of items in a itemset or we run out of support.
The common measures used by Apriori algorithm are Support and Confidence . We rank all the rules based on the support and confidence and filter out the most “interesting” rules. There are other measures to evaluate candidate rules and we will define two such measures Lift and Leverage.