Good morning! In the previous lessons, we talked about two algorithms (KNN and Naïve Bayes) that are mainly used for classification and prediction. Today, we’ll shift to a data mining tool called association rules, which can be used to generate certain insights about the collected data.
What Is the Main Idea behind Association Rules?
Association rules (also called affinity analysis and market basket analysis) are commonly used to identify clusters of items that are bought together. Most of the time, transaction-type datasets are used to derive certain associations.
Association rules take a form of “if __, then___.” For instance, we can say, “if Jim buys oranges, then he will buy bananas,” using association rules. The “if” part is called an antecedent, and the “then” part is called a consequent. It is also important to note that the antecedent and consequent should be disjoined, meaning that they don’t have any common items.
To better understand what stands behind association rules, consider the following table:
|Transaction||Item 1||Item 2||Item 3|
The table illustrates fictional transactions from a fictional makeup store. Before we proceed to calculating different metrics for association rules, we have to decide on an antecedent and a consequent. Let’s assume that mascara and eyeliner are antecedents and eye shadow is a consequent. We want to figure out how likely someone is to buy eyeshadow with mascara and eyeliner: If Anna is buying mascara and eyeliner, will she also buy eye shadow?
How to Measure the Strength of Association Rules
To measure the strength of association rules, we’ll use an Apriori algorithm that consists of support, confidence, and lift ratio.
Support. Support ratio is the frequency of the antecedent and/or consequent appearing together in the dataset. Support can be expressed as P(antecedent & consequent). In our example in the previous section, the support ratio would be equal 3/5 = 60%.
We can also calculate support for antecedent and consequent separately: P(antecedent) = 4/5 = 80% and P(consequent) = 3/5 = 60%.
Confidence. Confidence measures the co-occurrence of the antecedent and consequent. To calculate confidence, we need to divide the support ratio of the antecedent and consequent by the support ratio of the antecedent. That would be 60%/80% = 75%. High confidence usually implies strong association between antecedent and consequent. In our case, it means that 75% of the time when a person purchases mascara and eyeliner, they also purchase eye shadow.
(By the way, confidence is basically the conditional probability that we discussed in our previous lesson!)
Lift. Lift ratio is probably the most efficient way to evaluate the strength of association. This demonstrates the likelihood of the antecedent and consequent being purchased together, taking into consideration the popularity of the consequent.
To calculate the lift ratio, we divide the confidence ratio by support of consequent. That would be 75%/60% = 1.25. It is generally considered that lift ratios higher than 1 indicate strong association between items. Lift ratios below 1 mean that the items are not likely to be bought together.
Most Famous Applications of Association Rules
Here are the most famous examples of the application of association rules:
• Cross-selling of products. This is probably one of the main advantages of association rules. Many companies are definitely using them. Look at Amazon: When you buy something, additional products pop up as “recommended.”
• Diagnoses. Another application of association rules takes place in the medical field. Medical researchers use the rules to determine a patient’s diagnosis based on the list of symptoms that they have.
That’s it for today. Tomorrow, we’ll discuss cluster analysis in business analytics.
Share with friends