my publications - THE DATA SCIENCE LIBRARY

A taxonomy of personality traits and its application in recommendation systems for the sale of investment and luxury goods (Part 1)

admin — Thu, 16 Oct 2025 08:45:24 +0000

October 2025 | volume 79

Why knowing the client’s profile increases the effectiveness of selling expensive solutions

In low-risk purchases (e.g., butter, pasta), habits and situational constraints decide; the analysis of deep traits is often unnecessary. However, in investment choices — the choice concerning the purchase of a production line, a luxury car, or an expensive property — the client engages analytical processing, verifies risk, and expects trust and the adjustment of the communication style to their own preferences. The taxonomy of personality traits organizes and makes it possible to understand how the client makes decisions (e.g., preference for innovation vs. predictability, sensitivity to uncertainty, need for data vs. relationships). This structure of traits explains why the alignment of the tone of conversation, argumentation, and evidence to the client’s psychological profile raises conversion and satisfaction. Getting to know the client’s unique structure of traits and using it for proper communication — not through manipulation — is an optimization of the sales process by reducing uncertainty and cognitive costs for the client. Effective optimization of sales conversations requires a solid, evidence-based recognition of

What is the Big-Five factor structure?

The “Big Five” model is today the most widespread taxonomy of personality traits in the psychology of individual differences. Its roots reach back to the so-called lexical hypothesis and factor analyses conducted since the 1930s, and its contemporary form was consolidated in research from the 1960s and 1990s (Tupes and Christal; Norman; Goldberg; Costa and McCrae). The starting point was the hypothesis that the most important differences between people are revealed in the language they use. Gordon Allport and Henry Odbert (1936) cataloged approx. 18 thousand English words describing persons, creating the foundation for later factor analyses. Next, Ernest Tupes and Raymond Christal (1961; publ. 1992) demonstrated the replicability of five broad factors in trait ratings, which was confirmed by Warren Norman (1963). At the turn of the 1980s and 1990s, Lewis R. Goldberg consolidated the five-dimensional structure as the Big-Five factor structure, proposing an open descriptive taxonomy based on adjectives. In parallel, Paul Costa and Robert McCrae developed the NEO-PI-R/NEO-PI-3, questionnaire tools measuring five domains and 30 facets, which became the gold standard of measurement. Contemporary reviews (John, Naumann, Soto) describe this paradigm as a taxonomy of traits with high communicative usefulness.

Thus, the Big Five model is not new; it is the outcome of decades of scientific research. The world of sales became interested in this theory of the taxonomy of personality traits long ago, but there had been no possibility of translating it into effective actions. The situation changed with the appearance of new semantic-analysis systems based on artificial intelligence.

What is OCEAN Big Five?

O-C-E-A-N is an acronym of the five main dimensions of personality in the Big Five model:

O – Openness to Experience: curiosity, imagination, readiness for novelties and ideas.
C – Conscientiousness: order, self-discipline, goal orientation, predictability.
E – Extraversion: social energy, assertiveness, need for contact and stimulation.
A – Agreeableness: empathy, cooperation, trust, mildness in disputes.
N – Neuroticism (emotional reactivity): tendency toward anxiety, tension, sensitivity to risk.

W-MOSZCZYNSKI ps 10-25

OCEAN Big Five is useful for business because it organizes styles of information processing, tolerance of risk, and communication preferences of clients; it accounts for variables directly affecting purchasing decisions and the dynamics of negotiations. In practice, it is not simple incentives such as discounts, add-ons, bonuses that decide the purchasing decision; rather, the decisive role is played by the client’s assessment of arguments. It is not about using the “right words,” but more about what kind of evidence the client needs to deem the transaction trustworthy.

Cognitive fit

High Conscientiousness (C) prefers structuring, indicators, and procedures; the lack of hard data produces a sense of chaos, which can lead to a drop in trust and a hardening of the stance.

High Openness (O) expects novelty and wants answers to many questions. In such a situation, a routine offer may be assessed as lacking added value and as unattractive. Building an offer with personality traits in mind is very labor-intensive; therefore, it is possible to construct a RAG-type artificial-intelligence system to create individualized, psychologically tailored offers.

High Neuroticism (N), manifesting as reactivity to uncertainty, requires mechanisms of risk reduction in the form of guarantees and showing exit paths. Their absence will amplify loss aversion and lead to withdrawal from the transaction despite objectively good conditions.

High Agreeableness (A) is sensitive to the tone of the relationship. Confrontational communication conducted by the seller triggers defensive reflexes in the client and hardens the position.

High Extraversion (E) prefers quick, interactive contact and social proof; asynchronous and cool communication lowers the readiness to close.

To sum up — when the style of the message does not match the profile, e.g., an excess of details for a client who does not expect them, the client’s cognitive cost rises. The natural evolutionary trait of humans is saving energy. The client instinctively postpones the decision or outright refuses, because another, alternative place where they can make the purchase requires less cognitive effort. In the future the client will return to the place where earlier they made a cognitively cost-free transaction.

Forming trust

Trust arises when the content, evidence, and form of communication are consistent with expectations resulting from the client’s individual OCEAN profile. Mismatch generates signals of relational risk (e.g., “the seller is not listening to me”), which results in a hard negotiating stance or breaking off talks.

Including OCEAN is not a soft add-on, but a way to minimize informational friction and perceived risk, and thus a factor increasing the probability of a decision, the speed of closing, and satisfaction. Not taking the profile into account increases the probability of defensive reactions, withdrawal, hardening of the stance, and additional retaliatory demands, even with objectively favorable parameters of the offer.

Scientific foundations of transcript analysis

Over the last decade, the theory of semantic analysis — the so-called computational psychometrics of language — has developed rapidly. Research on large samples of social-media users has shown that open vocabulary analysis and models based on n-grams, LDA topics, or embeddings predict Big Five results better than traditional dictionaries — provided we have a sufficiently large text sample. Put simply, systems based on identifying the meaning of whole words, sentiment, tendencies turned out to be much more effective than the previous approach that relied on counting adjectives, verbs, frequencies of using specific words and phrases without deeper understanding of their meaning and context. A classic point of reference is the work of Schwartz et al. (PLOS ONE, 2013) conducted on over 75 thousand people. The authors showed that language features stably correlate with Big Five questionnaires. Subsequent works (Park/Schwartz et al.) and numerous replications in blogs, tweets, and spontaneous speech confirm these relationships and indicate that a greater amount and richness of language significantly improve the accuracy of personality prediction.

Data requirements: how long a text sample is needed?

The stability of linguistic indicators grows with the number of words. In research literature it has become customary to use minimal samples on the order of 100–200 words, emphasizing that shorter ones are unstable; the more text, the better. Moreover, the studies used report higher validity when the average length of utterances exceeds 600–4,000 words per person, depending on the domain that is the subject of conversation.

In the practice of profiling clients based on phone conversations, a minimum of 300 words of accumulated text is recommended. A phone call differs from an ordinary conversation in which the interlocutors see each other; therefore, it requires a different normalization. It is possible to collect and combine several conversations. Combining several transcripts can also offset the effect of the person’s temporary mood. Big Five traits are theoretically stable; however, on every occasion the profile should be updated on the basis of new transcripts.

Operationalization of the five dimensions in B2B/B2C sales of investment goods

In practice, Big Five traits do not say what the client will buy, but how they assess risk, data, innovations, and relationships. “Openness” favors a narrative about novelty, prototypes, and personalization; “Conscientiousness” — the need for hard indicators (SLA, TCO, audits); “Extraversion” — quick interactions and case studies with people; “Agreeableness” directs toward an empathetic tone and a tendency to avoid problems; “Neuroticism” seeks to reduce uncertainty through guarantees, transparency, and the possibility of potentially withdrawing from the transaction. These mappings are consistent with the descriptions of domains in the aforementioned NEO-PI-R by Paul Costa and Robert McCrae.

From an operational point of view this means two things: first, get to know the client (extract linguistic features from transcripts and other activities), then adjust the conversation script and materials (selection of arguments and tone) to the O-C-E-A-N profile. Such an approach simplifies the client’s decision, increases trust, and reduces informational friction, which is important especially when purchasing rare luxury or investment goods.

Methodological recommendations for the data analyst

On the data side, it is worth combining interpretable features (the old NLP approach, e.g., counters of key phrases for risk, timing, innovativeness) with semantic vectors (the new data-science approach based on the meaning of entire sentences and segments). Meta-analyses show that open models should not be fed extremely short samples; in practice one should strive to sum up several conversations per client and control the stability of the scale as the text length grows. Next, we build regressions for the five dimensions (with threshold calibration), validate “by person” (to avoid leakage), and report interpretability (features/fragments raising a given domain). The theoretical foundation — the lexical origin of traits and their five-dimensional structure — justifies that language in natural use provides diagnostic signals about the client’s decision preferences. Profiling is conducted in order to better match the offer, which leads to higher satisfaction with the transaction and a higher probability of transaction replication.

The Big Five is an old and well-confirmed taxonomy of traits which, thanks to the newest language-analysis methods based on artificial intelligence, has gained new, scalable applications in business. In the area of high-value sales, the key is first to get to know the client on the basis of their real language (minimum 300 words), and then to adjust communication to their O-C-E-A-N profile. Such automation does not replace the seller’s experience; rather, it strengthens it by providing objective guidelines for “how to talk” so as to speed up building trust and make it easier for the client to make a decision.

HEXACO as a development of the Big Five

HEXACO, as a continuation and complement of the Big Five OCEAN paradigm in business applications, arose from the practice of using Big Five OCEAN in the sale of luxury goods. It was noticed that there are still two dimensions that play an important role in making high-amount decisions. These are: Honesty–Humility. According to the latest research, which I will cite in a moment, HEXACO assessment improves the effectiveness and alignment of communication in B2B/B2C sales of luxury and investment goods.

Genesis and the place of HEXACO with respect to the Big Five

HEXACO grows out of the same lexical hypothesis as the Big Five: key individual differences are encoded in natural language; their structure is revealed in factor analyses of adjectives. In the 2000s, Kibeom Lee and Michael C. Ashton, on the basis of cross-linguistic lexical analyses, proposed a six-dimensional model which — alongside Extraversion, Agreeableness, Conscientiousness, Openness, and (redefined) Emotionality — introduces a new dimension, Honesty–Humility.

Compared to the Big Five, HEXACO changes the definitions of some traits. Emotionality (E) in HEXACO includes fearfulness, sensitivity, and attachment, whereas the components of anger and wrath shift to Agreeableness (A) as gentleness and forgiveness; this is a change relative to Neuroticism and Agreeableness in OCEAN, confirmed by research on the effectiveness of communication. The structures of corresponding traits correlate from moderate to high (highest for Extraversion, Conscientiousness, Openness), which speaks for paradigmatic kinship but also the added value of the six-factor model.

The contribution of HEXACO: the Honesty–Humility dimension and consequences for economic decisions

The biggest novelty of HEXACO is Honesty–Humility (H), combining sincerity, lack of a tendency to manipulate, modesty, and low greed related to aspirations to a certain material status or membership in some class. In numerous studies H predicts a lower propensity to unethical behaviors such as fraud, corruption, opportunism. People high in H show greater honesty in cooperative games and prefer fair treatment of others. In tests of comparative validity, HEXACO scales (especially H) have an advantage over OCEAN in predicting criteria related to honesty and cooperation. The mechanisms linking H with pro/antisociality seem to include, among others, concern for justice and low acceptance of using others instrumentally.

For practice this means that in high-stake negotiations (luxury goods, assembly lines) with clients whose profile has low H — that is, low humility and honesty — hard purchasing behaviors are generated more often: aggressive demands, lower sensitivity to ethical arguments, higher tolerance for accepting counterfeits. By contrast, high H prefers transparent conditions and long-term partnership; it is more resistant to short-term temptations such as discounts with costs hidden from the client. In consumer studies, a client with a high level of honesty and humility reacts poorly to examples of counterfeit luxury goods.

HEXACO and high-value decisions

In the sale of luxury and investment goods, key roles are played by the perception of risk, trust, and the adjustment of tone and evidence. The six-dimensional taxonomy allows these mechanisms to be modeled more precisely than OCEAN, because it separates two distinct sources of relational friction: exploitativeness (low H) and conflict/ unforgiveness (low A). Preparing the conversation with traits A and H in mind is a prevention against the emergence of retaliatory and escalation behaviors in the client. HEXACO has been deeply analyzed in behavioral games aimed at improving the stability of cooperation and avoiding the tendency toward “hard anchors” in negotiations and the tendency to escalate minor incidents into conflicts.

Additionally, the redefinition of Emotionality (E) as fearfulness/sensitivity (without the anger component) better models aversion to uncertainty and the need for safety mechanisms among some corporate clients: guarantees, reversible options, clear support procedures. In comparative studies these differences are well documented, both conceptually and psychometrically.

Frames of application in the sale of luxury and investment goods

As in OCEAN applications, the basis of HEXACO is the extraction of traits from language: from call-center transcripts, correspondence, meeting notes.

Example of creating an offer based on mapping the client’s profile

H (Honesty–Humility) high: maximum transparency, no “fine print”;
H low: tougher contract conditions, explicit consequences of violations, formal compliance procedures. Empirical links between low client honesty and the ethics of decisions and propensity for fraud. The client will like it on the principle of: “I would have written it that way too”;
E (Emotionality) high: first reduce uncertainty (guarantees, exit path), then complex comparisons;
E low: greater acceptance of “hard” data without a safety “cushion”;
A (Agreeableness) low: avoid a confrontational tone; design de-escalation and “cooling-off” procedures;
A high: expose “problem-free” operation, show a clean and easy exit path;
C (Conscientiousness) high: SLA, schedules, checklists, quality audits;
C low: visual summaries and short bullet points instead of long essays;
X (Extraversion) high: quick meetings, short quick calls, “on-the-go” interaction;
X low: instead of calling and “closing” in a meeting, send well-prepared materials for self-review: a PDF with the offer; show case studies. Give time for the decision: propose a time for questions by email or short message, not immediately a call. Form and tone: clear, without fireworks, concrete and structure (headings, summaries). Add-ons that help: checklists, comparison of variants in a table, link to a knowledge base; possibility to ask questions in writing. What to avoid with low X: intrusive phone calls, improvised video calls, and “let’s talk now” — this can discourage such a client;
O (Openness) high: a story about innovation, personalization, and aesthetics, particularly for luxury goods. It is worth “selling the idea,” the process, and uniqueness. What to avoid with high Openness: clichés and banalities (“the best quality,” “we are the leader” without substance); a rigid price list without a story; excessive standardization (no personalization options, “silver/gold/platinum” packages without modification); overly technical jargon without the “why”; faux-premium (generative stock photos, pretended “limitation” without verifiable history); lack of aesthetics in the presentation.
O low: focus on proven solutions, stability, compliance/compatibility, standards and best practices, zero experiments, predictable effects, references/cases, a clear migration path. With low openness, avoid flaunting novelty, experiments without a guarantee that it works, “innovation for innovation’s sake,” quick scope changes, too many options/configurations, ambiguous roadmaps, chaotic brainstorming, aesthetic stories without specifics.

Example of application: Purchase of luxury goods

High O and X (Openness and Extraversion) favor a narrative about uniqueness and social belonging. Typical examples include such notions as limited editions and other socio-cultural proofs of status and uniqueness. High H (Honesty–Humility) limits the propensity to purchase counterfeits, so confirmation and certification increase the perceived value of a luxury good for the client; low H requires stronger signals of enforcement of rights and guarantees of authenticity. Survey data and preregistered studies link H with a negative attitude toward luxury counterfeits.

In the transaction of purchasing an investment good

High C and E (sensitivity to risk) imply the need to stage the decision (POC, analysis, many meetings), hard KPI/SLA indicators, and reversible options — that is, the possibility of maneuvering during the purchasing process. The point is that two personality dimensions tell us what risk dominates in the relationship with the client. When Honesty–Humility (H) scores low, the risk of hard, opportunistic moves on the client’s side is greater, so we design the contract according to clear rules, introduce safeguards, milestone payments, zero ambiguities. When Agreeableness (A) scores low, clashes and escalations are more frequent; therefore we immediately foresee paths for extinguishing conflicts: a calm conversation mode, clearly described mediation and dispute-resolution procedures.

Choice of personality-trait taxonomy model

Not every sales goal requires HEXACO: in some fields OCEAN may turn out to be slightly more effective. Therefore, the choice of paradigm (OCEAN vs. HEXACO) should be empirical, through comparative validations on business criteria (conversion, length of sales cycle, stability of the relationship).

Summary

HEXACO constitutes a mature complement to the Big Five: it retains the strengths of the five-dimensional taxonomy and at the same time introduces the diagnostic dimension Honesty–Humility and more clearly separates the components of conflict and fearfulness, which is particularly important when purchasing luxury goods such as yachts, expensive cars, or residences. In the sale of luxury and investment goods, this translates into more accurate shaping of trust, risk control, and the tone of negotiations, which — combined with language diagnostics — makes it possible to operationalize fit to the client on solid empirical foundations.

Selected sources

[Source list translated one-to-one as in the article’s bibliography; titles preserved in original where appropriate.]
Allport, G. W., H. S. Odbert. 1936. Trait-names: A psycho-lexical study. Psychological Monographs (the classic root of the lexical hypothesis and the list of traits).
Ashton, M. C., K. Lee. 2008. Prediction of Honesty–Humility-related criteria… Journal of Research in Personality (criterion validation vs. FFM).
Ashton, M. C., K. Lee, R. E. de Vries. 2014. The HEXACO Honesty–Humility, Agreeableness and Emotionality factors: A review… PSPR (synthetic review of the construct and validity).
Costa, P. T., R. R. McCrae. 1992. NEO-PI-R Manual (operationalization of five domains and 30 facets).
Goldberg, L. R. 1990; 1993. Historical and empirical articles grounding the Big Five structure.
HEXACO-PI-R: tool materials and scale descriptions (psychometrics, adaptations, short versions).
Hilbig, B. E., et al. 2013. It takes two: Honesty–Humility and Agreeableness… Personality and Individual Differences (selective prediction of non-exploitation and non-retaliation).
John, O. P., S. Srivastava. 1999/2008. Reviews and updates of the integrative Big Five taxonomy.
Lee, K., M. C. Ashton. 2008. The HEXACO Model of personality structure and the importance of the H factor. Social and Personality Psychology Compass (overview of the model and role of H).
LIWC Manuals. 2001–2022. Basics of dictionary-based language analysis and notes on minimal text length.
Norman, W. T. 1963. Toward an adequate taxonomy of personality attributes. (replications of the five factors).
Park/Schwartz et al. (review and applications of computational language methods to personality).
Pilch, I. 2023. Comparison of the Big Five and the HEXACO Models… Current Psychology (differences in A/E, correlations between models).
Pletzer, J. et al. 2019. A meta-analysis of the relations between personality and… Acta Psychologica (A/E rotations and their consequences).
Applied studies indicating the link between accuracy and sample length (e.g., recruitment/essays ~660 words vs. thousands of words in social media).
Reinhardt, N., et al. 2023. Honesty–Humility & attitudes toward counterfeit luxury. Behavioral Sciences of Terrorism and Political Aggression / PsyArXiv versions (attitude toward counterfeits).
Schwartz, H. A., et al. 2013. Personality, gender and age in the language of social media. PLOS ONE (evidence that open-vocabulary predicts the Big Five).
Tupes, E. C., R. E. Christal. 1961/1992. Recurrent personality factors based on trait ratings. Journal of Personality, 60, 225–251 (replication of the five factors).

Wojciech Moszczyński — graduate of the Department of Econometrics and Statistics of Nicolaus Copernicus University in Toruń; specialist in econometrics, finance, data science, and management accounting. He specializes in the optimization of production and logistics processes. He conducts research in the area of the development and application of artificial intelligence. For years he has been engaged in the popularization of machine learning and data science in business environments.

Artykuł A taxonomy of personality traits and its application in recommendation systems for the sale of investment and luxury goods (Part 1) pochodzi z serwisu THE DATA SCIENCE LIBRARY.

Building a Recommendation System (Part 1)

admin — Fri, 15 Aug 2025 05:31:23 +0000

Without Data, There’s No Recommendation System

To recommend something to someone, you must first know them. To know someone, you must have detailed information about them—often more than they know about themselves. When a customer completes multiple transactions with us, we get to know their behavioral patterns, their frequency of visiting stores, how long they hesitate before purchasing, how often they act on impulse, and when a discount offer might work best. All of these data points form the foundation of a recommendation system.
To build such a system, we must first learn how to collect data about our customers. This article is dedicated to exactly that.

Why Customer Data Is Crucial for Recommendation Systems

Personalized recommendations function much like a skilled salesperson in a store—able to predict, based on collected information, what a customer might like. Recommendation algorithms (including FM and DeepFM) learn from data: the more accurate and comprehensive the behavioral and preference data you collect, the better the system can tailor offers to individual customers.

Traditional methods of gathering opinions (such as surveys) fall under explicit feedback—directly asking customers what they like. Unfortunately, such declarations are often imprecise or infrequent. This is why modern systems prefer to rely on implicit feedback—hidden data from real user behavior. Such data is richer and often reflects preferences more honestly, though it requires interpretation. In short: instead of asking—observe.

Below, we outline exactly which data points are worth collecting and how to do it discreetly, without overwhelming customers with questionnaires.

W-MOSZCZYNSKI ppic 9-25

Imagine you run an online bakery specializing in cakes, pies, and other pastries. You want to implement a modern recommendation system powered by algorithms such as Factorization Machines (FM) or DeepFM, so you can offer products aligned with each customer’s taste. For these tools to work effectively, they require rich data—covering preferences, purchase habits, and behavioral patterns.

This section provides a detailed analysis and a practical guide on how to discreetly collect such information without relying on low-reliability surveys.

What Customer Information Should You Collect?

To build an effective bakery recommendation system, focus on gathering behavioral data from your online store:

Purchase History (Transactions) – Record what the customer bought, when they bought it, how often they purchase, and how much they spend. This history reveals patterns—for example, whether someone buys cakes monthly (perhaps for events) or mainly before holidays. Such data supports cross-selling and helps predict future needs.
Browsed Products and On-Site Activity – Track which products and categories a customer views, how long they spend on them, and which pages they visit. This can reveal interests even without a purchase. For instance, frequent viewing of meringue cakes without buying signals an opportunity for targeted recommendations.
Search Queries – If your store has a search bar, log the exact terms entered. These are direct indicators of intent (“gluten-free,” “sugar-free,” etc.). This data informs both recommendations and inventory planning.
Cart Additions and Abandonments – Even abandoned carts indicate interest. For example, if a customer adds a chocolate cake but doesn’t purchase, you can send a reminder or offer a discount later.
Basic Demographics and Contact Data – Collected during checkout (name, address, phone, email). While they don’t directly reveal taste preferences, they can help with location-based offers and communication. Always ensure GDPR compliance.
Inferred Preferences – Derived from behavior (purchase history, browsing patterns, cart additions). For example, a customer who repeatedly orders birthday cakes with “Happy Birthday” inscriptions is likely buying for birthdays; another filtering for vegan products probably follows a vegan diet.

These are primarily first-party data—collected directly via your store—making them the most valuable for your recommendation model.

Methods of Collecting Customer Data

Collecting information should be seamless, integrated into store operations, and not require customers to fill in lengthy forms. Proven methods include:

Website Analytics – Tools like Google Analytics 4 track visits, page views, clicks, and time on site. Combined with cookies, this allows you to identify returning users and their interests.
E-commerce Event Tracking – Most platforms (PrestaShop, WooCommerce, Shopify) can track key actions such as product views, cart additions, checkout initiations, and purchases. These events reveal where customers hesitate and help train algorithms to identify product relationships (“Customers who viewed X often also bought Y”).
User Behavior Profiling – Encourage account creation by offering benefits (order history, faster checkout, loyalty discounts). Logged-in behavior can be linked to a persistent profile, allowing for personalized recommendations and targeted offers.
Heatmaps and Session Recordings – Tools like Hotjar or Crazy Egg show where users click, scroll, and pause, offering UX insights that can indirectly enhance recommendations.
Traffic Source and Campaign Analysis – Knowing whether a customer came from a Facebook ad or a Google search for “sugar-free cake” allows tagging them for relevant offers.
Loyalty Programs – Points, discounts, or perks for frequent customers encourage sign-ups, providing more structured behavioral data tied to a customer ID.
Reviews and Social Media Insights – Even unstructured comments can reveal purchase intent or preferences (“Beautiful cake for my son’s first birthday” implies repeat needs).
Aggregated Trends – Seasonal and contextual trends (e.g., higher cheesecake sales during holidays) can feed contextual features into the recommendation system.

Using the Collected Data in a Recommendation System

Once diverse customer and interaction data is collected, it can be used to:

Build User and Product Feature Sets – Factorization Machines require user features (ID, segment, preferences, average spend, location) and product features (ID, category, flavor, price range). The richer the feature set, the better the matching accuracy.
Enable On-Site Personalization – Dynamic sections like “Recommended for You,” “Customers Also Bought,” “Recently Viewed,” or “Bestsellers in Your Area” enhance engagement and sales.
Inform Marketing Decisions – Segment customers for targeted outreach (e.g., special offers for lapsed buyers, early birthday cake promotions for repeat birthday purchasers).
Continuously Improve Models – Retrain periodically as preferences evolve, add new features when gaps are found, and validate performance through click-through rates (CTR) and conversions.

Summary

For an artisan running an online bakery, customer data becomes as essential an ingredient as quality flour or a trusted recipe. Collecting it doesn’t need to be difficult or invasive—most of it is already flowing through your store in the form of digital traces. Your task is to gather, structure, and use it effectively.

Combine your craft expertise with structured behavioral data, and you’ll spot patterns that allow you to anticipate customer needs—sometimes before they’re even aware of them.

As for the next step—while many small businesses rely on third-party analytics tools, building your own structured database from system logs provides full control and independence. These logs contain raw sequences of customer actions—purchases, hesitations, and decisions—which, once filtered and structured, become the foundation for your own recommendation engine. That will be the focus of my next article.

Wojciech Moszczyński
Graduate of the Department of Econometrics and Statistics at Nicolaus Copernicus University in Toruń. Specialist in econometrics, finance, data science, and management accounting. Focused on optimizing production and logistics processes. Active researcher in AI development and applications. Long-time promoter of machine learning and data science in business environments.

Artykuł Building a Recommendation System (Part 1) pochodzi z serwisu THE DATA SCIENCE LIBRARY.

My projects

admin — Wed, 11 Dec 2024 16:47:07 +0000

PROJECT: INTELLIGENT BODY LEASING MANAGEMENT

When and where: The project was conducted in 2022-23 for a company specializing in IT staffing and body leasing.

Technologies used:

Python, SQL, Spark, Hadoop, Kafka, Flask, PostgreSQL, AWS

Description:

The project aimed to develop a system for optimizing the management of body leasing processes to enhance resource allocation and efficiency.
Advanced algorithms were utilized to forecast demand and match candidates with project requirements in real-time.
The system significantly improved decision-making and reduced operational costs for the organization.

PROJECT: RECOMMENDATION SYSTEM FOR E-COMMERCE STORE

When and where: The project was conducted in 2022-23 for an online retail company.