Practical notes on applying OCEAN/HEXACO psychological profiling in recommender systems for selling luxury and investment goods (Part 2)

November 2025 | Volume 79

This article refers to my earlier publication, where I presented assumptions for using the Big Five (OCEAN) and HEXACO models to profile customers and tailor sales communication.

I focus on two practical data sources: (1) call-center/voice-to-text transcripts from the advisory channel and (2) application logs from the web channel (clicks, searches, scroll time, navigation paths). I provide a scientific justification that both types of digital traces contain sufficiently rich linguistic and behavioral signals to estimate personality dimensions, as confirmed by research on predicting OCEAN from language and digital behavior, as well as meta-analyses of digital footprints. I state requirements for sample size, the validation scheme, and ethical-legal issues relevant to commercial applications. Prior findings show that matching the message to the personality profile increases the effectiveness of persuasive and sales activities—provided that the models are properly calibrated on appropriate data and respect privacy standards.

Modern computational psychometrics has repeatedly shown that spontaneous language (open vocabulary) and online behaviors predict questionnaire OCEAN scores at a level useful in practice. A classic study on more than 75,000 Facebook users demonstrated clear and stable mappings of the five OCEAN dimensions in linguistic features. Language from transcripts and “mouse” behavior patterns on a website carry information about personality dispositions that can be used to appropriately tailor commercial communication.

Transcripts in advisory channels

We obtain transcripts from ASR (automatic speech recognition) systems. After normalization (removing fillers, lemmatization, segmentation into speaker utterances) we obtain textual material that can be analyzed along two complementary paths:

(a) Interpretable features — input variables that are easy to understand and explain; it is clear what they measure and why they affect the outcome. For example, the number of expressive adverbs (“super,” “mega”) and invitations to make contact “live” indicate higher Extraversion (E).

(b) Semantic embeddings of sentences/segments compared with OCEAN/HEXACO “seed sentences.” The reliability of a text sample increases with text length.

Sample size. The stability of linguistic features increases with the number of words; numerous publications emphasize that short samples are unstable and require interpretive caution. Practical guidelines suggest a few hundred words as the minimum for meaningful conclusions. In our applications we assume 300 words of the customer’s cumulative speech or the sum of several conversations.

W-MOSZCZYNSKI ps 11-25

Application logs (web channel): measuring OCEAN/HEXACO traits

The second channel is digital behavior on the site and in the app: order of page views (click-through paths), time on subpages and components (dwell time), intensity of filtering/comparisons, search queries, returns to cart, depth of specification exploration, changes of variants, etc. Research on digital footprints shows that even simple activity patterns are characteristic of personality traits, and meta-analyses confirm significant, repeatable links between digital behaviors and the Big Five dimensions. The strongest evidence concerns mobile data (passive sensing): OCEAN levels can be predicted from smartphone logs (calls, location, app habits) at a practically acceptable quality; reviews particularly confirm good predictability for Extraversion, with smaller but significant effects for the remaining dimensions. Methodologically, these results are also important for website logs because they concern the same class of phenomena: stable patterns of technology use.

Example mappability of signals

Increased depth of comparisons, frequent switching of specifications, and use of calculators may indicate higher Conscientiousness (C).
Quick “live” decisions, willingness to make contact, and checking social proof—higher Extraversion (E).
Preference for “what’s new” sections and beta/limited versions—higher Openness (O).
Persistent checking of return policies, risk FAQs, and hidden costs—higher Emotionality/Neuroticism (E/N).
Preference for “About the brand,” certifications, and proofs of authenticity—higher Honesty–Humility (H).

Such assignments of simple, recurring behaviors must always be verified empirically—by building models on stable behavioral features and checking their validity against reference traits.

Study design and good (operational) practices

In a project aimed at learning customer traits, we combine both sources (transcripts + logs) into one pipeline. On the transcript side, we use cleaning, speaker segmentation, and extraction of features and embeddings; on the logs side—aggregation of event sequences into episodic and long-term features. We primarily analyze the repeatability of characteristic behaviors per session, creating a typical behavior pattern for each customer. We build models as regressions/classifiers for five (OCEAN) or six (HEXACO) dimensions with validation by person–session to avoid leakage between sessions of the same customer. Therefore, we do not pool logs from many sessions into one dataset, because we study the stability of behaviors. Behaviors across many sessions must repeat. The replicability of behavior patterns will be evidence of high-quality customer observation. For transcript analysis, for clarity I recommend a two-layer architecture: an interpretive layer (traditional counts of words and phrases) and a semantic/sequential layer (embeddings, vector models detecting sentence meaning).

Sample size is critical: on the text side, at least a few hundred words per customer for cumulative conversations; on the logs side—dozens of events per user and aggregation within a time window. Publications recommend plotting reliability curves (prediction quality vs. text length/number of events), which is a commonly recommended practice in analyses of online language stability. New AI methods provide many metrics to assess sample quality.

Research demonstrates the power of predicting traits from digital traces, which creates an obligation for transparency and proportionality of processing. Models should be designed according to the “benefit to the customer” principle (better matching, lower information friction), with data minimization, anonymization, access controls, and regular risk assessment. Reporting should include not only accuracy but also stability and calibration to avoid over-interpretation. Prior studies provide empirical background confirming the sensitivity of profiling from digital traces.

Both transcripts and application logs are valuable, complementary sources for estimating OCEAN/HEXACO profiles in the context of selling luxury and investment goods. Language captures the content and style of the customer’s thinking, while web-channel behaviors reveal habits of exploring risk, data, and novelty in a real purchasing process. Combining both sources—while meeting sample-length requirements, validation rigor, and ethical rules—enables operationalizing the principle “first learn, then tailor communication.” The results should be shorter decision cycles, higher satisfaction, and a higher probability of closing the transaction, consistent with experimental evidence on the effectiveness of psychological message matching.

Why salespeople should not be given scripts on how to talk to customers

Classical sales approaches emphasize aligning content and style to the customer type, based on situational knowledge and a repertoire of strategies. At the same time, we know that salespeople, like customers, have their own relatively stable OCEAN/HEXACO profiles that reveal themselves after a dozen or so sentences of interaction. Attempts by a salesperson to pretend a certain style over a long time are not durable; therefore, instead of changing the salesperson’s personality, one should optimize the pairing of salesperson and customer. Put plainly, a person is not able to change their style or psychological structure in a short time. They are not able to change their vocabulary so that the words they use in a sales conversation match the customer’s psychological structure. Yes, they may pretend to be someone else for a short time, but in the longer perspective such pretense will be ineffective. Moreover, a customer who realizes the salesperson is pretending may react unpredictably. One can ask whether such pretense or “playing to the customer” on the part of the salesperson is profitable and whether it carries certain risks. It is not simple to match a salesperson to a customer according to the OCEAN/HEXACO methodology. This aspect requires longer empirical research. The problem of proper matching undoubtedly exists. It is not based on the simple assumption that similarities attract and differences repel. The literature points to cases in which complementarity (e.g., high C in the salesperson with low C in the customer) can be beneficial—hence the need for empirically derived rules. Although similarity–attraction theory predicts greater interaction comfort with similar profiles (which may be advantageous in luxury and long-term relationships), in some cases complementarity is more effective. Therefore, matching rules should be learned from data and continuously updated—instead of assuming the universal primacy of similarity.

“Operating manuals” for customers and for salespeople can be embedded in the CRM and generated automatically based on OCEAN/HEXACO profiles.
Instead of expecting a salesperson to change personality several times a day, match salespeople to customers based on OCEAN/HEXACO profiles.

If only selected salespeople can work with specific customers, should those selected salespeople also write contracts, offers, and general correspondence with those customers? Why should a person not write written communications tailored to the customer’s psychological profile?

Why, from the perspective of using OCEAN/HEXACO, offers for customers should be generated by AI rather than manually by a human

If every offer and every paragraph of communication is to be precisely matched to the customer’s personality profile according to OCEAN/HEXACO, generating them by people is inefficient and error-prone: it scales poorly, does not guarantee consistency or objectivity, and is difficult to test and improve quickly. AI systems solve these limitations: they can reliably infer a profile from language and digital traces, select content for traits (psychological targeting), and then optimize the offer in an experimental cycle (A/B, multi-armed bandits).

A typical salesperson has limited memory and limited time. They cannot maintain linguistic rules and the right keywords and “golden phrases” for a thousand unique profiles and, on the fly, compose offers with different variants of tone, structure, and contractual safeguards. AI, however, can apply consistent rules at the level of words, phrases, and entire sections, taking into account both hard traits (e.g., for Conscientiousness—emphasis on SLAs) and soft traits (e.g., high N/Emotionality—reducing uncertainty in phrasing and the order of arguments). The empirical foundation that matching content to traits changes recipient behavior (clicks, purchases) has been provided by large-scale field experiments.

Humans are easily influenced by mood, fatigue, and the most recent contact. Algorithms, based on data traces, weigh information evenly and—importantly—do not prefer socially desirable profiles. Comparative results show that computer-based personality assessments were sometimes more accurate than those by friends or spouses of the same individuals. This is an argument that language-selection rules should follow from the model rather than an individual’s intuition.

Generative text systems can produce hundreds of offer variants (different tone, paragraph order, degree of formalization, choice of proofs) and test them experimentally (A/B/MAB), updating the generation policy based on real outcomes (conversion, time to decision, response to follow-ups). In the human world such a loop would be too slow and inconsistent.

Why this translates into sales: the “psychological matching” mechanism

In psychological targeting experiments, matching the message to Extraversion or Openness yielded large increases in clicks and purchases versus non-matched versions. This is strong evidence that micro-features of language and persuasive emphases must be retuned to the profile—otherwise the same offer can provoke resistance or boredom. We extend the same principle to the full OCEAN/HEXACO set, including Honesty–Humility. This is very important for luxury and investment goods due to perceptions of ethics, authenticity, and contractual risk.

What exactly AI does that a human will not do well—or at all

Infers the profile from transcripts and application logs.
Selects content and tone at the micro level (words/phrases), meso level (paragraphs: order, saturation with evidence/risk), and macro level (offer architecture).
Controls “risk words” (exclusion lists) and substitutes for given profiles to avoid defensive reactions.
Experimental optimization (A/B/MAB) on live data—rapid detection of “profile → variant → outcome” patterns and updating the generation policy.
Consistency and metadata: every offer has metadata; effective editing of metadata and subtext with respect to OCEAN/HEXACO traits is unachievable for a human at the scale of hundreds of customers.

Integration with sales practice in the CRM

In practice, a recommendation engine that automatically builds offers is a module in the CRM that takes a previously defined customer OCEAN/HEXACO profile and a business configuration (price list, SLAs, policies) and returns a composite document: tone variant, order of sections, selection of proofs, and what to avoid. Per-profile versions can be maintained as dynamic templates. Every output undergoes compliance checks and can be edited ex ante by a human. This human involvement is important and necessary. In data-science model pipelines this is called human-in-the-loop. However, AI generates the first version and controls the optimization experiments.

OCEAN/HEXACO in the context of configuring the subject of individual offers for luxury goods

There are strong scientific and tooling foundations for recommender systems to select specific product attributes (e.g., yacht drive variant, series limitation, finishing materials, financing mode) based on the OCEAN/HEXACO personality profile and on the behaviors and choices of “psychologically similar” customers, using matrix factorization, embeddings, and vector databases.

Empirical foundations

Psychological targeting works. As stated earlier, in experiments matching the message to Extraversion or Openness increased clicks and purchases compared with non-matched communication—showing that personality traits can be operationalized in marketing and sales, and that their impact is quantitatively significant.
Personality-aware recommendations. Literature reviews on personality-aware recommender systems show that incorporating personality features improves cold-start performance and recommendation stability by leveraging psychological similarity, not only behavioral similarity.
Luxury contains symbolic values. HEXACO traits (e.g., Honesty–Humility) are linked to attitudes toward authenticity, exclusivity, and brand ethics—factors of particular importance in luxury and investment goods.
Mechanism 1 (reduction of cognitive cost). OCEAN/HEXACO organize the style of information processing; tailoring tone and types of proof reduces cognitive cost, leading to higher conversion.
Mechanism 2 (transfer of preferences via “psychological near-neighbors”). Collaborative filtering exploits user similarity; incorporating psychological similarity improves the cold start and stabilizes recommendations.
Mechanism 3 (alignment with symbolic/self-expressive motives). Personality traits relate to preferences for limited editions, authenticity certificates, brand history, or risk-mitigating contractual clauses—dimensions that can be configured per profile.

Limitations and ethics

Domain validation — effects must be tested on real indicators (win rate, cycle length, CSAT), not only on RMSE/CTR.
Privacy and transparency — psychological profiles are sensitive; consent, data minimization, and bias audits are required (especially when the profile affects price/terms).
Stability of inference — models must report uncertainty (e.g., short linguistic sample length).
Boundaries of influence — personalization should help the decision, not manipulate beyond the client’s interest.

Summary

Combining the OCEAN/HEXACO profile with matrix factorization + embeddings + a vector database creates a coherent, scientifically documented foundation for automatically selecting offer attributes in luxury markets (yachts, residences, cars). We have evidence for the effectiveness of psychological matching at the level of purchasing behaviors, mature techniques for tailoring actions to personality, and a body of literature linking traits with decisions concerning luxury goods and authenticity.

THE DATA SCIENCE LIBRARY

Wojciech Moszczyński