1 Data
The data used to demonstrate in this book will be an open source data, available on GitHub: https://github.com/RodzanIskandar/PowerBI_dashboard_e-commerce_transaction/ETD_clean_data.csv
After a bit of EDA, let’s load the relevant columns, then let’s see what the data looks like:
= fread('ETD_clean_data.csv') |>
df_transactions select(customer_id,
province,
date,
stock_code,
unit_price,
quantity,|>
sales) mutate(customer_id = str_replace(customer_id, '.0$', ''),
date = date(date)) |>
filter(customer_id != 'no customer id')
|>
df_transactions select(-description) |>
head() |>
print.data.frame()
customer_id province date stock_code unit_price quantity sales
1 16010 DKI Jakarta 2015-11-30 22811AP 44250 6 265500
2 16010 DKI Jakarta 2015-11-30 21713AP 31500 8 252000
3 16010 DKI Jakarta 2015-11-30 22927AP 89250 2 178500
4 16010 DKI Jakarta 2015-11-30 20802AP 24750 6 148500
5 16010 DKI Jakarta 2015-11-30 22052AP 6300 25 157500
6 16010 DKI Jakarta 2015-11-30 22705AP 6300 25 157500
1.0.1 Quick EDA
Let’s get an idea of what the data looks like.
|>
df_transactions mutate(date_year = year(date),
date_month = month(date)) |>
summarise(.by = c(date_year, date_month),
sales_monthly = sum(sales)) |>
mutate(x_month = row_number()) |>
ggplot() +
geom_line(aes(y = sales_monthly,
x = x_month)) +
theme_classic()
Quiz
What columns do not seem relevant?
What other EDA would you perform?
How would you assign any kind of segmentation of the customers?