One thing that I enjoy doing sometimes is looking at the dynamics of a business that can be illustrated with data. From time to time I pick up a scenario that is different from what I am used to (i.e banking and financial services) and look for patterns in it.
Today, I will be talking about one specific kind of business, a hybrid (B2C and B2B) E-commerce. E-commerce platforms are a means of selling goods of any kind through the internet. These can be physical goods that need to be shipped somewhere (think about your latest amazon order) or even completely digital goods, that can be accessed instantly.
The E-commerce website I studied is a UK-based retailer that sells primarily gifts. With that context in mind, I wondered how gift-giving culture in a particular region (such as the UK) could influence the seasonal purchasing trends on this website and tried to describe the phenomena.
Mapping out the Phenomena
I always like to map out my assumptions and hypotheses about the phenomena I want to model before I start doing exploratory data analysis on it. The image below illustrates how that went for this project.
By looking at the map, I posed four questions that could shed some light on the purchasing behavior of the clients of this platform over the course of the year. Below are my observations for each one of these.
What are the periods of most activity throughout the year?
A way to measure activity on an e-commerce platform is to use the WAU (weekly active users) metric. This is calculated from the dataset by counting the number of unique clients purchasing one or more times in the platform. This is a good way of measuring activity because it smoothes out the clients that make bulk purchases (which can be other businesses instead of regular clients).
The graph above suggests that there is a growing trend of active users in the platform along the last few months of the year (from October to December), including weeks that have commercial holidays in them (such as Black Friday and Mother’s day)
What are the most popular products on the platform?
Looking at popular products can help us understand what the website is most known for and what are the customers generally interested in. To illustrate this, we will take a look at the number of orders that contain a specific product was in invoices (again, to smooth out bulk purchases).
How much do the top 20 most popular products represent in total sales?
Looking at the most popular products, I wondered what is their contribution to the total sales on the website. Removing revenue related to postage and shipping-related items, it turns out that the 20 most popular products alone represent around 6.5% of the total revenue made on the website from November 2016 to December 2017.
If we look more closely, we can see something quite interesting:
In E-commerce scenarios, the most popular products are not necessarily the most profitable ones.
This is clear when we see that, even though the “white hanging heart-light holder” product sold 3 times as much as the “regency cakestand 3 tier” product, it only made 1/4 of the revenue. Is this behavior seasonal? Do we have products that become more relevant at a particular point of the year? We will investigate that more closely with the following section, by exploring
Is there an increase in sales of a particular product in the top 20s closer to holidays?
To analyze such behavior, I defined a column in the dataset that calculates the time in days until the next holiday for each invoice, considering the UK’s most relevant holidays in gift-giving culture.
These are holidays in which people tend to buy each other gifts of all kinds. Some of these holidays are Christmas, Mother’s Day, and Father’s Day. Along with that, I considered days of the year that usually denote activity in E-commerces, such as Black Friday and Boxing Day.
Measuring the time to the nearest holiday for each invoice yields the image above, which shows us that people do tend to buy more on this retailer closer to holidays than not (a right-skewed distribution).
In fact, about 50% of all orders on the website happen less than 27 days before a commemorative date). It looks like people from the UK are indeed quite punctual.
But if we want to know which kinds of products sell the most at a particular point in time, we should look at the behavior of each product or category of products independently.
The plot above shows the time series for each of the top 20 products on the platform. Notice the line for “paper chain kit 50’s christmas”. At the beginning of the year, sales for this product are virtually zero.
At around week 44 it starts to grow quite rapidly as people start looking into Christmas decoration, peaking around week 48, when it surpasses all other products in terms of number orders.
Understanding this kind of behavior allows for E-commerce retailers to stock up beforehand at the right time, minimizing costs with unnecessary products in stock over the year.
How much more can we expect products to sell with an increase in active customers?
We came back full circle to the topic of active users. Now that we know we have more active customers towards the end of the year, how can we quantify the average effect of having more active customers in the platform?
For that, we will use regression analysis. Specifically, we will perform a regression considering the weekly active users in the platform as the independent variable trying to predict the number of times products are sold in a particular week.
After some preprocessing of the data, we are able to achieve a reasonable regression model for our use case, with the slope of the model being around 4. That indicates that for every new active user we bring in a week, we expect 4 products to be sold. If we “activate” 100 clients, we should expect sales to increase by around 400 items.
At this point, we managed to become more familiar with the E-commerce setting and, by analyzing transactional data, we can make the business operation of such a website more predictable.
Predictability in businesses allows us to better manage them and reduce inefficiencies, illustrating yet another superpower data science gives us.
If you enjoyed this article
This is the first article on a series of explorations of E-commerce data. It was originally written for the Udacity Data Scientist Nanodegree, and the entire jupyter notebook with the code used in this analysis can be found here.