Customer Intelligence with R

Customer Activation, Development, Retention, and Segmentation (CADRS)

Updated 2024-01-06

Introduction

Purpose

‘Customer Intelligence with R’ (CI with R) is for learning the basic application of customer activation, development, retention, and segmentation (CADRS) with R. It is aimed to be educational outside of the academia. In general, the topics broadly fall under

for the purpose of commercial success and optimisation using customer transaction data.

Starting with CADRS insights and labelling, the learning format is generally broken down into few parts:

Coding demonstration
Output preview of the code
Links (usually Wikipedia) for the curious
Quizzes that challenge you to expand on the basics

On the R side, we will mainly be focussing on tidytable and tidymodels libraries, with an example open source data available online.

What is customer transaction data?

To put it simply, when you go shopping and you get your receipt, that is customer transaction data.

In the context of this book, this book utilises such data from the perspective of the vendor, where all the receipts are recorded for each of the shopping members. In the data, this means that the most basic form of such data will have (customer or membership) ID, date, and product ID columns. Other columns may include price per unit, quantity purchased (commonly negative if refunded), quantity unit (e.g. litres, units, metre_cubed, etc.), tax, material costs, bundle ID (e.g. for crates of bottles, promotion bundle, etc.), region, etc.

Here is an example preview of such data:

Table 1: Transaction data
customer_id	date	product_id	net_euros
cmg94	1994-05-02	m3vc90	12,04
gjo532	2010-11-27	3465u098	72,87
hfh5	2003-06-07	gvm49	4,72

where net_euros would be something along the lines of price × quantity.

Who is this book for?

‘CI with R’ demonstrates some basic, applicable, and deliverable CADRS examples of how customer transaction data can be utilised for business value.

As for prerequisites, ‘CI with R’ book is for R users that have at least few months experience that do not require explanations of the tidytable (or dplyr+ or tidyverse) functions.

For those looking to get to that level, I would recommend mastering the R for Data Science 2nd edition book, and watch a bit of how Hadley Wickham codes (though the video is a bit dated now, especially the old pipes).

Further, knowing the basics of tidymodels will certainly help; this library is rather recent, so if you are looking for some tidymodels in action, Julia Silge’s YouTube videos and /r/tidymodels would be recommended.

In general, ‘CI with R’ will use the libraries tidytable, tidymodels, lubridate, and stringr, with conflicted to override some functions. You can load the libraries with the following code:

suppressPackageStartupMessages(suppressWarnings({
  library(tidytable)
  library(tidymodels)
  library(stringr)
  library(lubridate)
  
  # libraries for specific sections
  library(tidyclust)
  library(plotly)

  conflicted::conflict_prefer_all(winner = "tidytable",
                                  quiet = TRUE)
}))

Loading individual libs from tidymodels is better for production

For installing R libraries, pak is recommended over install.packages() (except when you install pak).

For those who want to learn a few example models, the models you will learn ‘CI with R’ book are:

RFM (recency, frequency, monetary value)
Entry product
Cross sell
Churn detection
Forecasting

Acknowledgements

This is essentially a compiled list of resources used. This book would not be around today without the years of resourcefulness from

Licence

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The code in this book is public domain, licensed under Creative Commons CC0 1.0 Universal (CC0 1.0).