How to stop the process of customers leaving your online store? Survival Analysis Survival Test

W-MOSZCZYNSKI ps 8-24

How to Stop the Customer Exodus from Online Stores? Survival Analysis

An old truth says: „It’s easier to retain a customer than to acquire a new one.” However, as e-commerce markets become more competitive, retaining customers is becoming increasingly challenging. Customers are receiving more appealing offers and finding better prices due to platforms that compare prices. The e-commerce market is becoming a perfectly competitive market. This means that all market players have similar information and make decisions in an extremely rational way.

A competitive advantage in a near-perfect competitive market can only be achieved in one of three areas, provided that the other two areas are maintained at the market level.

The first area is operational excellence. This involves having the lowest price, fastest delivery time, or highest quality of service. Operational excellence requires an advantage that customers will appreciate. Achieving such an advantage is not very profitable. Even if we lower prices to a minimal margin, customers come, but we still won’t make a profit.

The second area is technological advantage. This involves a sort of monopoly, selling goods that are significantly technologically superior to those of the competition. For example, having a website that brilliantly suggests what customers should buy or offering a unique product that no competitor has. With something exceptional, prices can be raised, yielding good profits. This creates a monopoly within a perfectly competitive market.

The third area is building customer trust. The customer might know that others sell products at lower prices or have better delivery systems, but they choose our store because they’ve known it for years and trust it. The customer remembers that in challenging situations, we’ve always been friendly and open to problem-solving. This type of competitive advantage is the most profitable, as it doesn’t require substantial investment like technological superiority, nor does it involve selling goods at a loss, as is often the case in creating operational excellence.

How to Build “Customer Trust” Advantage?

Building this kind of advantage requires years of mutual cooperation. But time alone is not enough; sometimes, despite years of collaboration, a customer will leave, and we’re not entirely sure why. We helplessly observe this happening with many valuable customers. They leave, and we don’t know why.

However, new data analysis technologies allow us to understand why customers leave. Many tools enable us to analyze customer sensitivity, preferences, and what annoys or repels them. This type of analysis must be objective. Everyone views things subjectively in their way. Just because free pens and loyalty points make us happy doesn’t mean others feel the same. Perhaps monthly supermarket brochures in our mailbox irritate us, but some people love to receive and browse these leaflets. Our subjective opinions have no relevance and can be harmful in customer analysis.

How to Study Customer Preferences?

Surveys
The first choice is often a survey, which is the worst option. Subjectively, I must say, I don’t know anyone who enjoys filling out surveys. Of course, some people will fill one out, but the information they provide may not be accurate. Surveys aren’t representative of the entire population, as they’re filled out only by a certain group of people inclined toward compliance or helping others. Analyzing uncertain data from a specific segment is a waste of time. Thus, surveys are ruled out.

Statistical Analysis

A common method of customer analysis is sales register analysis, combined with CRM data. Using statistical tests like ANOVA and predictive models, relationships are sought between the customer’s purchases and the stimuli associated with them in the CRM database. In practice, this might involve taking a customer, ID1412, and comparing their purchase history with the history of stimuli they received, such as emails, SMS, phone calls, and offers.

This method is known as banking marketing. It’s not the best, as it first requires bombarding the customer with stimuli to then examine how they respond. On a perfectly competitive market, customers are too valuable to send stimuli merely to gauge sensitivity.

One method remains: survival analysis (Survival Test).


Survival Analysis

Survival analysis is a set of statistical methods used to analyze data concerning the time until a specified event occurs. This event could be defined differently depending on the context, e.g., time until patient death in medical studies or time until subscription cancellation with a cable provider.

The odd name of this test and related terms like “patient death” or “probability of patient survival” stem from the grim origins of the test. Survival analysis began in hospitals, aiming to determine the survival probability of groups of patients subjected to various treatments. Typically, survival tests aren’t applied in economic processes. Since Sir Ronald Aylmer Fisher, population comparisons have been performed using statistical tests like t-tests, ANOVA, and other nonparametric tests. However, survival analysis emerged to estimate the survival likelihood of patients without waiting for death, allowing treatment modifications to prolong patient lives. Do you see the analogy to our online stores? In e-commerce, we want our customers to stay with us as long as possible. We can’t afford to apply statistical methods only after customers leave. We need to know the likelihood that our customers will eventually leave so we can retain them longer.

Kaplan-Meier Tests

One primary method in survival analysis is the Kaplan-Meier estimator, a nonparametric method for estimating survival function, which is the probability that a subject survives for a given period. This function is graphically represented as a Kaplan-Meier curve, showing decreasing survival probability over time. This procedure is particularly useful for censored data, where we don’t know the exact time of the event for all observations (e.g., a customer may still be active when the study concludes).

Cox Proportional-Hazards Model

The Cox proportional-hazards model is an advanced method in survival analysis, allowing for the inclusion of various independent variables (covariates) that affect time to the event. In e-commerce, the Cox model can be used to examine the impact of different stimuli sent to customers and assess how they affect the likelihood of remaining or leaving the store.


Key Features of the Cox Model

Proportional Hazards: The model assumes that hazard rates for different individuals are proportional and constant over time.
Nonparametric Baseline Distribution: The Cox model does not assume a specific distribution for the baseline hazard function.
Flexible Covariates: Both continuous and discrete variables can be included, allowing flexible modeling of customer traits.
The Cox model can help segment customers by churn risk or other deactivation forms. Customers at higher risk may need more attention from marketing. Moreover, the Cox model can forecast churn probability, enabling companies to take preventive action.

Example of Use in an Online Store

We run an e-commerce platform and want to understand what influences customer loyalty or how long they stay active and make purchases. With sales data linked to individual customers, we can estimate the Kaplan-Meier curve to understand typical customer activity duration. We define “patient death” as a long absence from the store.

Comparing Kaplan-Meier curves for different customer segments (e.g., promotional vs. regular price buyers) highlights customer loyalty trends.

Practical Application of Survival Analysis

Academic discussions about survival analysis are irrelevant without practical applications. The following analysis closely resembles real-world e-commerce analytics. It includes conclusions based on segmentation and stimuli analyses. We often analyze individual customers, even within a population of hundreds of thousands.

Cox Proportional-Hazards Model

Now we apply the Cox model to examine how various factors impact the time until churn. We expect similar findings to those described earlier.

The Cox Proportional-Hazards Model effectively shows what benefits our customers and what we should avoid.

Marketing Department Conclusions Based on Cox Model Results

This simple program gave us extensive insights into customer sensitivity to marketing tactics. Key findings include:

Loyalty Card: Customers with loyalty cards have a fourfold lower churn risk compared to those without.
Gender: If the hazard ratio for gender (e.g., for women) is below 1, women have a lower churn risk. Marketing strategies can target men, who are less stable, to increase loyalty.
Visit Frequency: A hazard ratio significantly below 1 for visit frequency suggests that frequent visits reduce churn risk.
E-invoicing: Customers using e-invoices have a lower churn risk, making e-invoice promotion worthwhile.
Offer Intensity: High offer intensity drastically raises churn risk. Offer frequency should be optimized.
Phone Contact Frequency: Frequent phone contact significantly raises churn risk. A refined strategy for phone outreach is needed.
Summary

Survival analysis using the Kaplan-Meier estimator and Cox model provides valuable insights into factors affecting customer loyalty in an online store. Based on these results, the marketing team can implement targeted actions to retain customers, such as personalized offers for high-risk customers. Additionally, loyalty programs can be optimized to retain less loyal customers effectively.

By applying survival analysis methods, we gain a deeper understanding of customer behavior and can identify factors affecting loyalty, essential for managing customer relationships effectively in e-commerce.

Wojciech Moszczyński

Wojciech Moszczyński – a graduate of the Department of Econometrics and Statistics at Nicolaus Copernicus University in Toruń, specializing in econometrics, finance, data science, and management accounting. He focuses on optimizing production and logistics processes and has been actively involved in promoting machine learning and data science in business environments.