5 Churn

Churn detection is a quick, easy, and useful approach to detecting potential ‘lapsed’ or ‘lost’ customers, providing insight and prioritisation for customer retention. RFM gives useful insights as an overview, and churn gives immediate depth.

5.1 Weakness of RFM

As seen from previous section, ‘lapsed’ or ‘lost’ customers are based on their recency. While it has its merits, it has its weaknesses. As mentioned previously, purchasing rhythm (or purchase interval, similar concept to frequency) may need a bit extra dissecting. For example, a customer may be buying a coat every month, while another customer is buying milk every other day. From RFM alone, the milk buyer would be labelled to be the better customer simply due to faster purchase rhythm of milk.

5.2 How churn detection adds value

This is where churn detection can provide extra insight and immediate actionables. It looks at

What a normal purchase interval or median customer behaviour looks like,
Defines what ‘recency’ threshold should be based on the normal behaviour, then
Labels ‘churn’ for each customer if worse than a standard deviation compared to normal in recent time.

For demonstration purposes, the code for churn detection will focus on customer purchase rhythm instead of product specific purchase rhythm.

5.3 Coding churn

Now, let’s code the bullet points described above.

5.3.1 Find interval days for each customer

Code
Output

# Find the interval between dates
df_interval_days = df_transactions |>
  distinct(customer_id, date) |>
  mutate(date = as.Date(date)) |> 
  arrange(.by = customer_id,
          date) |> 
  mutate(.by = customer_id,
         interval_days = date - lag(date)) |> 
  filter(!is.na(interval_days))

df_interval_days |>
  head() |> 
  print.data.frame()

  customer_id       date interval_days
1       12370 2016-01-20       50 days
2       12370 2016-03-31       71 days
3       12370 2016-06-02       63 days
4       12370 2016-07-26       54 days
5       12370 2016-10-24       90 days
6       12370 2016-11-30       37 days

5.3.2 Calculate median and standard deviation

Then we define ‘normal purchasing rhythm’ as median of interval_days, coded as interval_median in below code.

Code
Output

# Median and standard dev of interval of days
df_interval_median = df_interval_days |>
  summarise(.by = customer_id,
            interval_median = median(interval_days),
            interval_sd = sd(interval_days)) |> 
  filter(!is.na(interval_sd))

df_interval_median |> 
  head() |> 
  print.data.frame()

  customer_id interval_median interval_sd
1       12370       58.5 days    18.38931
2       12371       70.0 days    69.75911
3       12375       14.5 days    68.80019
4       12379      151.5 days   101.11627
5       12382      116.0 days    57.14310
6       12383       74.0 days    19.79899

5.3.3 Define the ‘recent’ threshold, and extrapolate recent behaviour

Code
Output

# Same from RFM section
date_latest_transaction = df_transactions |> 
  pull(date) |> 
  max()

# Defining the 'recent' threshold
df_interval_recent = df_interval_median |> 
  mutate(recent_threshold = as.integer(interval_median * 2)) |> 
  inner_join(df_transactions) |>
  distinct(customer_id, date, interval_median, interval_sd, recent_threshold) |> 
  filter(date > date_latest_transaction - ddays(recent_threshold)) |> 
  distinct()

df_recent_behaviour = df_interval_recent |> 
  mutate(.by = customer_id,
         interval_recent_days = date - lag(date),
         interval_recent_median = median(interval_recent_days, na.rm = T)) |>
  filter(!is.na(interval_recent_days)) |> 
  select(customer_id, interval_median, interval_sd, interval_recent_median) |>
  distinct()

df_recent_behaviour |>
  head() |> 
  print.data.frame()

  customer_id interval_median interval_sd interval_recent_median
1       13340        119 days    49.21721              86.5 days
2       16961        128 days    88.39118              86.0 days
3       12452        138 days    56.53612             109.0 days
4       14600        130 days    42.59499             110.0 days
5       14072        117 days    52.04165              92.0 days
6       15808         98 days    43.43194              82.5 days

5.3.4 Check if recent behaviour is below standard

You can see where this is going. Now we are getting close to the graph at the beginning of this section. This part checks if they have shopped recently and churning.

Code
Output

df_churn_recent = df_recent_behaviour |>
  mutate(churn_recent = as.numeric(interval_recent_median) > interval_median  + interval_sd)

df_churn_recent |>
  count(churn_recent) |> 
  print.data.frame()

  churn_recent    n
1        FALSE 1241
2         TRUE   32

Quiz

How would you check for customers that haven’t shopped recently? (hint: look at the filter function of df_interval_recent when defining ‘recent’ behaviour)
Besides standard deviation, what other formulations can be used to determine churn?
RFM labels and churn labels can overlap. How do you reconcile between the two systems of labels?
Instead of customers’ purchase rhythms, how would you determine customer churn using products’ purchase rhythms? (hint: find each products’ purchase rhythms first)