Customer Segmentation Analysis with SQL: Leveraging the RFM Model to gain Deeper Insights
- majiriok

- Dec 15, 2024
- 5 min read
What drives business? Customers. Understanding them is essential. In today's competitive retail landscape, grasping customer behavior is crucial for companies aiming to optimize marketing strategies and boost retention. One effective method is customer segmentation—dividing customers into distinct groups based on their purchasing habits and preferences. In this article, I'll demonstrate how to segment customers using the RFM (Recency, Frequency, and Monetary) model. Using SQL for my analysis, I'll explore data to identify high-value customers, uncover patterns, and provide actionable insights. These insights help businesses tailor their approach to different customer groups, enabling more targeted marketing efforts and better resource allocation—ultimately driving improved business outcomes.
RFM (Recency, Frequency, and Monetary) model:
Recency, frequency, and monetary value (RFM) is a model used in marketing analysis that segments a company’s consumer base by purchasing patterns or habits. In particular, it evaluates customers’ recency (how long ago they made a purchase), frequency (how often they make purchases), and monetary value (how much money they spend).
RFM is then used to identify a company’s best customers by measuring and analyzing spending habits to improve low-scoring customers and maintain high-scoring ones.
RFM analysis numerically ranks a customer in each of these three categories, generally on a scale of 1 to 5 (the higher the number, the better the result). The “best” customer would receive a top score in every category. These three RFM factors can be used to reasonably predict how likely (or unlikely) a customer will do business again with a company. Invetopedia
Business Implications of the RFM Model:
Targeted Marketing Campaigns
Businesses can tailor marketing strategies to specific customer segments. Those who've made recent, frequent, or high-value purchases should be prioritized for promotional offers and loyalty programs. This approach boosts customer engagement, leading to higher conversion rates and improved retention.
Improved Customer Retention
By identifying customers with infrequent or lapsed purchases, businesses can proactively implement retention strategies, such as reactivation campaigns. This approach reduces churn rates and fosters long-term customer loyalty.
Competitive Advantage
Implementing RFM analysis gives businesses a competitive edge by providing deeper insights into their customer base compared to rivals using simpler segmentation methods. This advanced understanding of market dynamics enables smarter, data-driven decision-making.
Preparing the Data:
The dataset from Kaggle captures detailed shopping transactions from 10 shopping malls between 2021 and 2023. Each transaction features a unique invoice number and includes key customer information: a customer ID, gender, and age. It also records product details like category, quantity, and unit price. The data covers payment methods (cash, credit card, or debit card), invoice dates, and the specific mall where each transaction occurred. This comprehensive dataset provides a rich view of customer purchasing behavior.
Before implementing the RFM model, ensure the integrity and quality of your customer data. You must also verify that the columns required for RFM analysis are present: last purchase date, total orders, and total money spent.
Data preparation involves tasks like handling missing values, merging data from various sources, and ensuring consistency. With clean, ready-to-use data, you can then implement the RFM model using PostgreSQL queries. Click this link to view the full SQL script.
Below is an overview of the dataset:

Exploring the Data:
Verify that the dataset has no missing values. The data quality meets our requirements. Now, we can proceed to explore the available data.


Mall of Istanbul generated the highest revenue at $13M, while Forum Istanbul mall generated the lowest at $3M.


Women typically spend more on shopping compared to men.


Clothing items generated the most revenue of over $31M.


2022 generated the highest revenue in the 3-year period, reaching $31M. However, there was a significant decline in 2023, with revenue dropping to $5M.


Recency, Frequency, and Monetary (RFM):
The following query extracts customer-level data, calculates the time since each customer's last purchase and summarizes their total orders and revenue.

For further analysis, we'll create a Common Table Expression (CTE) from our previous query. This CTE allows us to reuse the result in subsequent queries without repeating the same logic. We'll then use SQL's NTILE function to divide customers into three equal groups (tiles) based on their recency, frequency, and monetary value.

The following query enhances our previous RFM segmentation by introducing two crucial features:
rfm_score: The sum of individual RFM ranks (recency, frequency, monetary) for each customer.
rfm: A concatenation of the three RFM scores, creating a unique 3-digit RFM code for each customer, allowing for more nuanced segmentation.

This final SQL code enhances our RFM segmentation by categorizing customers into distinct segments based on their RFM code. It defines customer types such as "new customers," "loyal customers," and "top customers" using specific combinations of recency, frequency, and monetary scores.
The query assigns customers to different segments using their 3-digit RFM scores, facilitating targeted strategies for each group.
Using a CASE statement, the code labels customers based on their RFM score (derived from the concatenated rfm column). Each unique RFM combination corresponds to a specific customer segment:
New Customers: Customers who have recently purchased but haven’t bought much or spent much.
Lost Customers: Customers who once engaged with the business but have not done so recently.
Regular Customers: Those who purchase occasionally, with moderate engagement.
Loyal Customers: Customers who regularly engage and make frequent purchases.
Top Customers: The highest value customers who buy frequently, recently, and spend the most.


Findings & Recommendations:

The RFM segmentation offers a clear picture of our customer segments for targeted marketing. Our analysis shows that lost customers constitute 23.96% of our dataset, contributing $16.42M in revenue. Interestingly, the top customer segment generates the highest revenue at $30.70M. The business must nurture new customers carefully, as they've just experienced their products and have the potential to become valuable long-term patrons. While the segments in between shouldn't be overlooked, they may not require immediate priority. However, the business should remain attentive to their needs to maintain their engagement.
We have segmented our customers, each with a unique customer ID. Enterprise Resource Planning (ERP) systems typically store personal information, including at minimum phone numbers and email addresses. The business should engage these segments with personalized marketing and loyalty programs using the following targeted strategies:
New Customers: Implement engagement campaigns to foster loyalty.
Lost Customers: Launch re-engagement campaigns to win them back.
Regular & Loyal Customers: Offer loyalty programs or incentives to encourage continued purchases.
Top Customers: Provide premium services, exclusive offers, or VIP treatment to maintain their loyalty.
Collaboration between operations and marketing teams is crucial for achieving outstanding results. It's important to note that our analysis must be ongoing, as consumer behavior is constantly evolving.
In conclusion, this analysis offers valuable insights into customer segmentation using the RFM model. I hope you've gained new knowledge and practical understanding from this exploration.
Happy Learning!!!



Comments