Explaining the Application of One-Way ANOVA in E-Commerce

MOSZCZYNSKI 9-24

One-Way Analysis of Variance (One-Way ANOVA) is a statistical test used to compare the means of three or more groups to determine whether there are significant differences between these groups. In the context of e-commerce, the One-Way ANOVA test can be particularly useful for understanding customer behaviors and optimizing marketing and pricing strategies.

Example of Applying the One-Way ANOVA Test in E-commerce

An online store wants to understand whether differences in price levels affect customers’ inclination to leave the store without making a purchase. In this case, we can apply the One-Way ANOVA test, as it analyzes more than two groups. We arbitrarily created three product groups, specifically three price ranges:

  • Low prices: products in the price range of 10-30 PLN
  • Medium prices: products in the price range of 31-60 PLN
  • High prices: products in the price range of 61-100 PLN

Technically, such grouping can be done in the sales register obtained from the sales system. It contains transaction dates and the prices at which products were sold. An important element of the sales register is the customer ID, ensuring that no transaction is anonymous. The entire analytical process that we will now carry out can be used for further customer segmentation into specific clusters.

The next step is to gather information about customers who either made or did not make a purchase on the website. Customers may visit the online store, view prices, and either abandon or purchase the item they were searching for.

To understand customer behavior, a special program on the e-commerce platform is needed to track these behaviors. Such programs are not very popular in the e-commerce industry. Companies that have these solutions are at the absolute forefront. They use an enormous amount of information about customer behaviors on online store pages. Without such technology, it is impossible to know whether a customer visited the site and did not buy anything, hesitated, or looked for other offers. This is an excellent solution that provides a wealth of information about customer behavior patterns.

At this point, I would like to emphasize that price ranges are just one example of what can be tested in the context of customer behavior. It is a very simple way to illustrate the problem. In reality, we can gather information on competitor product prices or create product clusters based on industry, sales frequency, or the popularity of certain groups of products promoted through our ads or brochures. There are countless possibilities worth testing.

In our example, the inclination to leave can be measured as the percentage of customers who left the page without making a purchase within each price range.

Explanation of One-Way ANOVA Test Application in E-commerce

In this study, we use a statistical test. When conducting such a study, it is essential to establish a null hypothesis and an alternative hypothesis. The null hypothesis always states that something does not exist or that there are no statistical differences, and so on.

Conversely, the alternative hypothesis states that something does exist, that there are statistically significant differences between the groups being compared. In our case, we want to test whether the inclination to abandon purchases is consistent across all three price categories: low, medium, and high-priced products.

Formulating the Hypothesis:

  • Null Hypothesis (H0): The average inclination to leave the store is the same for all three price groups.
  • Alternative Hypothesis (H1): There is at least one price group for which the average inclination to leave differs from the others.

Conducting the One-Way ANOVA Test:

  • Step 1: Calculate the average inclination to leave for each price group.
  • Step 2: Calculate the variances within each group and between groups.
  • Step 3: Use the F-test to check whether the differences between groups are statistically significant.

We randomly selected 5 products from each group and observed how, for example, 50 customers behaved for each product. Let’s assume we obtained the following results:

Price Range Inclination to Leave
Low Prices 20, 22, 19, 21, 23
Medium Prices 30, 32, 31, 29, 33
High Prices 40, 42, 41, 39, 43

We calculate the average inclination to leave for each price group:

  • Low prices: (20+22+19+21+23)/5 = 21%
  • Medium prices: (30+32+31+29+33)/5 = 31%
  • High prices: (40+42+41+39+43)/5 = 41%

We calculate the variances within groups and between groups and substitute them into the formula for the F-test, testing the result with the p-value coefficient.

If the test result is greater than the critical value at a specific significance level (e.g., 0.05), we reject the null hypothesis. If the test’s p-value is lower than 0.05, it indicates that the differences between the groups are statistically significant.

If the ANOVA test shows that there are statistically significant differences in the inclination to leave between price groups, the online store may consider adjusting its pricing policy to reduce the inclination of customers to leave the page without making a purchase.

In our example, the p-value is very low, far below the critical value of 0.05, indicating statistically significant differences in customer behavior across the different price ranges. This means that the average customer has different price sensitivity in the high-priced product range compared to low-priced products. Indeed, the ANOVA test only indicates that differences exist, but it does not specify between which groups. To find out which groups differ from each other, a Tukey HSD test should be used.

Why is This So Effective?

I recently used the term “average customer.” But what if we had customer groups—specifically, customers assigned to clusters based on specific behavioral characteristics? Customers from each of these clusters could be tested with ANOVA for their tendency to abandon purchases. Transpositions of various statistical and mathematical methods offer vast possibilities for obtaining valuable insights. Each of these insights can be monetized by applying appropriate sales strategies for specific customer groups. Using transpositions of various methods will create numerous customer variants within specific clusters. Mathematical methods should also be used to create optimal strategies for these subgroups. Above all, we should not assume that strategies for each of these groups will be created manually. Such strategies should be created automatically using classification models or neural network algorithms.

Four years ago, I published an article on my website describing the use of one-way ANOVA in analyzing the population of 19th-century France and the propensity of residents in specific communities to commit crimes.

Exactly these types of methods are used in analyzing customers. The key to effective customer management and the foundation for creating a recommendation system is the effective segmentation of customers into groups and subgroups, which involves a minimal amount of code written in Python.

Let’s analyze this example. The database is available to everyone in the Datavis repository of training databases.