用戶分層-如何使用SQL計算RFM模型

1 評論 1670 瀏覽 15 收藏 13 分鐘

在產(chǎn)品運(yùn)營中,我們經(jīng)常需要將用戶進(jìn)行分層,以便更好針對性做運(yùn)營策略。本文分享了如何用SQL結(jié)合RFM模型,對用戶進(jìn)行分層的方法,供大家參考學(xué)習(xí)。

RFM模型通常用于分析用戶數(shù)據(jù)庫,以識別最有價值的用戶。

Recency (R)– 用戶最后一次購買的時間。距離現(xiàn)在時間越短,用戶再次購買的可能性越大。Frequency (F)-用戶在一定時間內(nèi)購買的次數(shù)。頻率越高,表明用戶對品牌的忠誠度越高。Monetary (M)-用戶在一定時間內(nèi)為公司帶來的總收益。金額越高,表明用戶的價值越大。

通過RFM模型,企業(yè)可以對用戶進(jìn)行細(xì)分,比如將用戶分為高價值用戶、需要挽留的用戶、有潛力的用戶等,然后根據(jù)這些細(xì)分采取不同的營銷策略。

作為產(chǎn)品經(jīng)理如何使用SQL計算RFM模型,對用戶進(jìn)行分層呢?

一、數(shù)據(jù)源準(zhǔn)備

用戶會員表數(shù)據(jù)

訂單表數(shù)據(jù)(部分字段)

因 MySQL 性能問題,我們將數(shù)據(jù)通過Binlog訂閱同步到 Hive 進(jìn)行計算;

而、數(shù)據(jù)計算

2.1、RFM模型的計算步驟如下:

確定時間范圍:首先確定分析的時間范圍,比如過去一年或過去六個月。

這里我們使用

AND TO_DATE(o.SOCreateTime) >= ‘2024-04-01′
AND TO_DATE(o.SOCreateTime) <=’2024-06-30’

收集數(shù)據(jù):收集客戶在所選時間范圍內(nèi)的所有交易記錄。

SELECT m.mimemberid AS memberid,
MAX(o.socreatetime) AS last_order_time,
DATEDIFF(‘2024-07-01’, MAX(o.socreatetime)) AS R,
COUNT(o.soordersn) AS F,
SUM(o.sototalamount) AS M
FROM ods_travel.v_teschoolinnermarket_memberinfo m
LEFT JOIN paimon.fts_base_tetravelrvsorder.schoolorder o
ON m.mimemberid = o.somemberid
WHERE o.SOPayStatus = 2
AND m.MIStatus = 0
AND TO_DATE(o.SOCreateTime) >= ‘2024-04-01’
AND TO_DATE(o.SOCreateTime) <= ‘2024-06-30’
GROUP BY m.mimemberid
ORDER BY R ASC;

計算Recency (R)

  • 對于每個客戶,找出最后一次購買的日期。
  • 計算從最后一次購買到當(dāng)前日期的天數(shù)或月數(shù)。

計算Frequency (F)

  • 對于每個客戶,計算在所選時間范圍內(nèi)的購買次數(shù)。

計算Monetary (M)

  • 對于每個客戶,計算在所選時間范圍內(nèi)的總購買金額。

DROP TABLE IF EXISTS adsxyt_travel.userrfm;
CREATE TABLE adsxyt_travel.userrfm
STORED AS ORC AS
WITH mada_order_num AS (
SELECT a.SOMemberId AS memberid, COUNT(*) AS ordernum
FROM paimon.fts_base_tetravelrvsorder.SchoolOrder a
INNER JOIN paimon.fts_base_tetravelrvsorder.SchoolOrderExpand b ON a.SOOrderSn = b.SOEOrderSn
WHERE a.SOPayStatus = 2
AND TO_DATE(a.SOCreateTime) >= ‘2024-04-01’
AND TO_DATE(a.SOCreateTime) <= ‘2024-06-30’
GROUP BY a.SOMemberId
),
base_data AS (
— 查詢最原始的RFM值
SELECT m.mimemberid AS memberid, MAX(o.socreatetime) AS last_order_time
, DATEDIFF(‘2024-07-01’, MAX(o.socreatetime)) AS R
, COUNT(o.soordersn) AS F, SUM(o.sototalamount) AS M
FROM ods_travel.v_teschoolinnermarket_memberinfo m
LEFT JOIN paimon.fts_base_tetravelrvsorder.schoolorder o ON m.mimemberid = o.somemberid
WHERE o.SOPayStatus = 2
AND m.MIStatus = 0
AND TO_DATE(o.SOCreateTime) >= ‘2024-04-01’
AND TO_DATE(o.SOCreateTime) <= ‘2024-06-30’
GROUP BY m.mimemberid
),
quartiles AS (
— 按照數(shù)據(jù)的4分位數(shù)計算RFM得分
SELECT *, NTILE(4) OVER (ORDER BY R) AS R_score
, NTILE(4) OVER (ORDER BY F DESC) AS F_score
, NTILE(4) OVER (ORDER BY M DESC) AS M_score
FROM base_data
),
quartiles_fixed AS (
— 四分位數(shù)修正
SELECT *
, CASE
WHEN R_score = 1 THEN 4
WHEN R_score = 2 THEN 3
WHEN R_score = 3 THEN 2
ELSE 1
END AS R_score_fixed
, CASE
WHEN F_score = 1 THEN 1
WHEN F_score = 2 THEN 2
WHEN F_score = 3 THEN 3
ELSE 4
END AS F_score_fixed
, CASE
WHEN M_score = 1 THEN 1
WHEN M_score = 2 THEN 2
WHEN M_score = 3 THEN 3
ELSE 4
END AS M_score_fixed
FROM quartiles
),
means AS (
SELECT AVG(R_score_fixed) AS r_mean, AVG(F_score_fixed) AS f_mean
, AVG(M_score_fixed) AS m_mean
FROM quartiles_fixed
)
SELECT qf.memberid, mc.ordernum, qf.last_order_time, qf.R, qf.F
, qf.M, qf.R_score_fixed AS R_score, qf.F_score_fixed AS F_score, qf.M_score_fixed AS M_score, m.r_mean
, m.f_mean, m.m_mean
, CASE
WHEN R_score_fixed > m.r_mean THEN ‘高’
ELSE ‘低’
END AS R_label
, CASE
WHEN F_score_fixed > m.f_mean THEN ‘高’
ELSE ‘低’
END AS F_label
, CASE
WHEN M_score_fixed > m.m_mean THEN ‘高’
ELSE ‘低’
END AS M_label
FROM quartiles_fixed qf
CROSS JOIN means m
LEFT JOIN order_num mc ON qf.memberid = mc.memberid
ORDER BY qf.R ASC;

通過一系列公共表表達(dá)式(CTEs)構(gòu)建了一個RFM(最近購買行為、購買頻率、購買金額)分析模型,用于對會員進(jìn)行分類。首先,它計算了每個會員在指定時間段內(nèi)的訂單數(shù)量、最后下單時間、以及基于這些數(shù)據(jù)的RFM原始值。接著,通過四分位數(shù)方法為每個RFM值分配得分,并進(jìn)行修正以確保得分與會員價值正相關(guān)。然后,計算這些得分的平均值,用于確定每個會員的RFM標(biāo)簽(高或低)。最后,結(jié)合這些標(biāo)簽和訂單數(shù)量,對會員進(jìn)行分類,并按最近購買行為進(jìn)行排序。

為RFM打分

  • 將R、F、M的值分別進(jìn)行標(biāo)準(zhǔn)化或歸一化,以便于比較。例如,可以使用排名或百分比來為每個維度打分。
  • Recency可以按照時間從近到遠(yuǎn)進(jìn)行排序,然后分配分?jǐn)?shù),時間越近分?jǐn)?shù)越高。
  • Frequency可以按照購買次數(shù)從多到少進(jìn)行排序,然后分配分?jǐn)?shù),購買次數(shù)越多分?jǐn)?shù)越高。
  • Monetary可以按照總金額從高到低進(jìn)行排序,然后分配分?jǐn)?shù),金額越高分?jǐn)?shù)越高。

綜合RFM得分

  • 將R、F、M的分?jǐn)?shù)相加,得到每個客戶的RFM總分。
  • 根據(jù)總分將客戶分為不同的群體,如高價值客戶、需要挽留的客戶、低價值客戶等。

分析和應(yīng)用

  • 分析不同RFM群體的特征,制定相應(yīng)的營銷策略。
  • 例如,對于高RFM得分的客戶,可以提供忠誠度獎勵或個性化服務(wù);對于低RFM得分的客戶,可以設(shè)計促銷活動以提高其購買頻率和金額。

四等位數(shù)法

其中使用了四分位數(shù),是統(tǒng)計學(xué)分位數(shù)中的一種,把所有數(shù)值從低到高(或者從高到底)排列并分成四等份,處于三個分割點(diǎn)位置的數(shù)值就是四分位數(shù)。
一般表示為:
Q1:樣本排列中處于25%位置的數(shù)字;
Q2:又稱為中位數(shù),指的是樣本排列中處于50%位置,即中間位置的數(shù)據(jù);
Q3:樣本排列中處于75%位置的數(shù)字。

假設(shè)樣本數(shù)據(jù)項(xiàng)數(shù)一共是N:
則Q1的位置數(shù)值=(N+1)/4;
Q2的位置數(shù)值=(N+1)/2;
Q3的位置數(shù)值=3(N+1)/4。
如果(N+1)恰好是4的倍數(shù),則確定四分位數(shù)比較簡單,如果不是4的倍數(shù),相關(guān)位置的四分位數(shù)就應(yīng)該是相鄰兩個數(shù)值的標(biāo)志值的平均數(shù)。權(quán)數(shù)的大小取決于兩個數(shù)值距離的遠(yuǎn)近,距離越近權(quán)數(shù)越大,距離越遠(yuǎn),權(quán)數(shù)越小,權(quán)數(shù)之和等于1。

DROP TABLE IF EXISTS adsxyt_travel.userrfmcategory;
CREATE TABLE adsxyt_travel.userrfmcategory
STORED AS ORC AS
SELECT memberid, mada_ordernum, last_order_time, R, F, M
, R_score, F_score, M_score, r_mean, f_mean
, m_mean, R_label, F_label, M_label
, CASE
WHEN R_label = ‘高’
AND F_label = ‘高’
AND M_label = ‘高’
THEN ‘重要價值用戶’
WHEN R_label = ‘高’
AND F_label = ‘低’
AND M_label = ‘高’
THEN ‘重要發(fā)展用戶’
WHEN R_label = ‘低’
AND F_label = ‘高’
AND M_label = ‘高’
THEN ‘重要保持用戶’
WHEN R_label = ‘低’
AND F_label = ‘低’
AND M_label = ‘高’
THEN ‘重要挽留用戶’
WHEN R_label = ‘高’
AND F_label = ‘高’
AND M_label = ‘低’
THEN ‘一般價值用戶’
WHEN R_label = ‘高’
AND F_label = ‘低’
AND M_label = ‘低’
THEN ‘一般發(fā)展用戶’
WHEN R_label = ‘低’
AND F_label = ‘高’
AND M_label = ‘低’
THEN ‘一般保持用戶’
WHEN R_label = ‘低’
AND F_label = ‘低’
AND M_label = ‘低’
THEN ‘一般挽留用戶’
ELSE ‘未分類’
END AS user_category
FROM adsxyt_travel.userrfm;

  • 使用CASE語句根據(jù)R、F、M的標(biāo)簽值(’高’或’低’)來確定用戶類別。這些標(biāo)簽可能代表了用戶在最近性(Recency)、頻率(Frequency)、貨幣價值(Monetary)三個方面的表現(xiàn)。
  • user_category是根據(jù)R、F、M的標(biāo)簽值組合來定義的用戶類別,如“重要價值用戶”、“重要發(fā)展用戶”等。(參照上述表格)

三、數(shù)據(jù)結(jié)果

敏感數(shù)據(jù)不做暫時,本文提供 SQL 計算解決思路,具體可參照實(shí)驗(yàn)。

本文由 @李昂 原創(chuàng)發(fā)布于人人都是產(chǎn)品經(jīng)理。未經(jīng)作者許可,禁止轉(zhuǎn)載

題圖來自Unsplash,基于CC0協(xié)議

該文觀點(diǎn)僅代表作者本人,人人都是產(chǎn)品經(jīng)理平臺僅提供信息存儲空間服務(wù)

更多精彩內(nèi)容,請關(guān)注人人都是產(chǎn)品經(jīng)理微信公眾號或下載App
評論
評論請登錄
  1. 有用的案例

    來自江西 回復(fù)