3  Entry Product

Entry product is a quick, easy, and useful method that gives insight on customers’ activation behaviour depending on their first product purchase. It can show which products might leave a better first impression on the customers (i.e. resulting in one-time buyers vs. recurring buyers) or which first purchases have the lowest success rates. With such insight, it can help escalate or de-escalate priority on certain products.

3.1 Finding top entry products

# Get first purchase date of each customer
df_customer_entry_dates = df_transactions |> 
  summarise(.by = customer_id,
            date = min(date))

# Filter out the transactions with only the first purchase dates of each customer
df_transactions_entry_only = df_transactions |> 
  inner_join(df_customer_entry_dates) |> 
  select(customer_id, stock_code)

# Absolute count of entry products
df_entry_product_count = df_transactions_entry_only |> 
  count(stock_code) |> 
  arrange(-n)

df_entry_product_count |>
    head() |>
    print.data.frame()
  stock_code   n
1   85123AAP 589
2    22423AP 517
3    84879AP 398
4    22720AP 384
5    22960AP 368
6    47566AP 356

So we found our top entry products in absolute terms, but does that mean they have the highest success rate?

3.2 Finding successful entry products

First, let’s define what ‘success’ means in this context. We will use df_customer_rfm_type from manual segmentation section, where after the customers made their first purchase, they are not a ‘lapsed or ’lost’ customers. In other words, success = TRUE if the customer is neither lapsed nor lost RFM type.

Then, we make the counts of the success for each of the products as well as their success rates. We define ‘success’ if customers are not lapsed or lost.

# Calculate the success rate of entry products
df_successful_entry_products = df_transactions_entry_only |> 
  inner_join(df_customer_rfm_type) |> 
  mutate(success = if_else(customer_type != 'Lapsed' &
                           customer_type != 'Lost',
                           T, F)) |> 
  summarise(.by = stock_code,
            count_success = sum(success),
            count_total = n()) |> 
  mutate(success_rate = count_success / count_total) |> 
  arrange(-success_rate)

df_successful_entry_products |>
  head() |> 
  print.data.frame()
  stock_code count_success count_total success_rate
1    16049AP             1           1            1
2   16169KAP             4           4            1
3   16169NAP             1           1            1
4    20653AP             1           1            1
5    20663AP             1           1            1
6    20671AP             1           1            1

However, you can see the flaws of this from the above table with all these 100% success rate entry products. There are multiple approaches to amend this situation; here is one:

df_successful_entry_products |> 
  filter(count_total > 300) |>
  head() |> 
  print.data.frame()
  stock_code count_success count_total success_rate
1    22469AP           127         309    0.4110032
2    21034AP           132         325    0.4061538
3   85099BAP           123         320    0.3843750
4    21212AP           118         323    0.3653251
5   85123AAP           211         589    0.3582343
6    22457AP           107         320    0.3343750

Quiz

  • What are other potential biases of the entry products with highest success rates?

  • What can you implement in order to amend some of the biases?

  • What business decision would you advise with the worst entry products? Why? Would this feed into the bias of the results in the future?