Emil Fine

Unlocking Customer Insights Using Simple Machine Learning

“ Selling to people who actually want to hear from you is more effective than interrupting strangers who don’t ” – Seth Godin (Yahoo VP of Direct Marketing, Entrepreneur, Author)

Goal

In this article I will do a simple customer segmentation problem using Machine Learning. The methods are generally similar to the kernels submitted on kaggle.com, except where I think other methods are better. The result will be clearly identified customer segments based on the given sample data. While simple, this can be merged with more data to provide additional insights.

What is Customer Segmentation and Why is it important?

Customer market segmentation attempts to identify a group of customers (a segment, a cluster, a group are used interchangeably) who differ from another within the given population. It is used by product, marketing and sales teams to share the right story, to the right people, at the right time by understanding the needs of different groups.

Traditional Approach

Traditional segmentation is often manual and works with limited data, this makes it difficult to understand personas on a deeper level and more data makes it exponentially more expensive without a clear benefit.

Machine Learning: Turn Data into Gold

This is where Machine Learning can help by leveraging:

History & Technical details

I will be using the quick and efficient ‘K-Means’ algorithm to do a very simple customer segmentation example. The simplified idea is that if a point belongs to a cluster, it should be near to lots of other points in that cluster. It was proposed by Stuart Lloyd from Bell Labs in 1957 for pulse-code modulation and published in 1982[1]. It is an unsupervised learning algorithm that finds a fixed number of clusters. An improved algorithm, K-Means++, was proposed in 2006 by David Arthur and Serei Vassilvitskii[2]. It uses an updated approach in determining the starting location for the algorithm and reduces the time to find an optimal solution. I’ll be using the K-Means Algo in combination with the silhouette score to determine how many clusters are optimal. Note: While determining the silhouette score is computationally more expensive, we don’t have a large enough data sample to notice, while yielding more concrete results as you will see.

Given Data and Conclusions

We’re given a dataset containing the following attributes: CustomerID, Gender, Age, Income, and Spending Score. After running through the algorithms, we have the following main graphs:

The first graph shows 5 clear clusters based on Annual Income vs Spending.

Customer Segments for Income vs Spending

The second graph shows 4 clear clusters based on Age vs Spending.

Application of ML to Arrive at Conclusions

How did we arrive at those customer segments?

First we do a quick analysis of the given data to see if anything pops out.

We can see that there are clear groupings in income vs spending, and those are the important metrics for a store. Avg vs Spending doesn’t have groupings that are easy to identify, but that’s what we have the algorithms for.

Next we determine how many clusters our data actually contains by calculating the silhouette score, which is the mean silhouette coefficient over all instances (data points). An instance’s silhouette coefficient is

a = mean distance to other instances in the same clusters

b = mean nearest-cluster distance

The highest score where k = 5 identifies the ideal number of customer segments for Income vs Spend

Another method, called the ‘Elbow Method’, is possible and more efficient but as you can see it is a more coarse method when trying to finding the ‘elbow’. As we are not bound by resources in this example, I focused on the silhouette score. Interestingly, following the discussions of others, I did not notice many utilizing the more succinct silhouette method.

Additions

At this point I have achieved the goal of creating a number of customer segments from which the marketing team can create strategies. There are many improvements possible though in this example, such as:

References

[1] Stuart P. Lloyd, “Least Squared Quantization in PCM.” IEEE Transactions on Information Theory 28, no.2 (1982): 129-137

[2] David Arthur and Sergei Vassilvitskii, “k-Means++: The Advantages of Careful Seeding,” Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (2007): 1027-1035.

All code can be found on my github account
Thanks for visiting my page!

Exit mobile version