A Recommender System in the Mill (Part 1) Building on Azure Synapse

We are building a recommendation system for an online food wholesaler offering organic flours made from exotic and rare grains. The store serves approximately 20,000 customers across Europe, and the system’s goal is to suggest products that customers are likely to enjoy and ultimately purchase.

A key assumption is that recommendations do not need to operate in real time—the model will be refreshed periodically (e.g., once a week) based on accumulated data.

Data Sources

We draw on three primary data sources:

Customer profile database – containing static client information: gender, age, country and place of residence, type of business activity (e.g., whether the customer is a dietitian, baker, restaurateur, or an instructor running bread-baking workshops).
Transaction database (sales register) – containing purchase history: which products were bought, when transactions occurred, etc. Each transaction is linked to a specific customer, allowing tracking of preferences and shopping habits.
Application log (behavioral) database – processed data describing customer behavior in the store (e.g., site interactions, reactions to promotions). This includes emotional and behavioral attributes such as: tendency to respond to promotions, speed of purchase decisions, likelihood of abandoning the cart, susceptibility to recommendations, or propensity to click banners. These features are extracted from logs and stored in a structured form for each customer.

Such data enables the recommender system to tailor offers more effectively. For example, a “promotion-oriented” customer might respond better to discounts, while a dietitian might be more interested in gluten-free products. Based on the sales register, we can identify customer preferences for specific products.

Azure-Based Recommendation System

(Synapse + Azure ML with Microsoft Recommenders)

Azure does not provide a single, fully managed “out-of-the-box” recommendation service. Instead, we assemble a solution using several components:

Azure Synapse Analytics (or Azure Databricks) for data processing and model training.
Azure Machine Learning for experiment/model management, and optionally databases or services for serving results (e.g., Azure Cosmos DB, Azure SQL, or Azure Kubernetes Service + API).
Microsoft Solution Accelerators, such as the Moyo Azure Synapse Retail Recommender Solution, which provides an end-to-end retail product recommendation pipeline using Synapse (Spark), model training, and deployment as a service.

Concept: transactional data is processed in Synapse (Spark), the model is trained in Azure ML, and the recommendations are stored in a database or made available via an API. The entire pipeline can run in batch mode (e.g., weekly).

Data Preparation

We collect customer, product, and interaction data.

Customer database: gender, age, country, etc.
Application logs: behavioral metrics.
Sales transactions: the key link between users and purchased products, forming the core training data.

These datasets are loaded into Azure Data Lake and processed/joined in Synapse (Apache Spark) or Azure Data Factory to create model-ready tables. Product attributes (e.g., flour categories, grain type, region of origin) can also be included as item features for more advanced approaches.

Algorithm Selection and Model Training

Azure allows complete flexibility in recommendation methodology—collaborative filtering or content-based/hybrid approaches. Microsoft’s open-source Microsoft Recommenders library provides many algorithms and examples (e.g., ALS, Bayesian Personalized Ranking, SAR co-occurrence, sequential models, neural networks, LightGBM).

Collaborative Filtering (ALS)
The Moyo accelerator uses a matrix factorization ALS model trained on user–product transactional data (implicit feedback). After data cleaning and creating the interaction matrix, ALS in Spark MLlib is trained. The result is a model that predicts preference scores for each user–product pair based on latent vectors.

From this, we generate top-N recommendation lists per user (excluding already purchased products). These can be stored in Azure Cosmos DB and refreshed weekly by rerunning the pipeline. The same ALS model can also be used for item-to-item recommendations.

Content-Based / Hybrid
Alternatively, we can use user and product features to predict purchase likelihood. Microsoft proposes a LightGBM ranking model using:

user features (demographics, behavioral log indicators),
product features (category, grain type, etc.),
aggregated interaction stats (e.g., purchase counts per category).

Positive examples are historical purchases; negative samples can be generated. This allows recommending new products based on profile similarity without relying solely on co-purchase patterns.

Behavioral attributes from logs (e.g., “susceptibility to recommendations”) can also drive segmentation (e.g., k-means clustering) and influence how recommendations are ranked for different segments. In ALS, these features are not used during training but can be applied later for filtering or re-ranking.

In practice, one could train ALS on purchases and LightGBM on features, then combine results (ensemble) or select the better approach.

Serving Recommendations

In a weekly batch mode, it’s often enough to store per-user top-10 lists in a database or warehouse (e.g., Azure Cosmos DB JSON, SQL tables) for the website to display in “Recommended for You” sections. Synapse integrates with Cosmos DB (Synapse Link) for fast loading. Orchestration can be done via Azure ML Pipelines or Azure Data Factory/Synapse Pipelines.

For item-to-item recommendations (“Customers also bought…”), ALS similarity scores can be precomputed offline or exposed in real time via a REST API deployed on Azure Kubernetes Service (AKS) through Azure ML, integrated with Azure API Management.

Summary

Azure’s approach is highly flexible—algorithms can be tailored, and non-standard data (e.g., emotional attributes) can be incorporated in any way. It does, however, require more engineering work: preparing data, choosing/deploying algorithms, and setting up training/serving infrastructure (Synapse Spark or Azure ML).

Microsoft helps by providing sample code (Solution Accelerators, the Recommenders repo) and tight service integration (Synapse↔Cosmos DB, Azure ML→AKS).

W-Moszczynski (2)

Wojciech Moszczyński
Graduate of the Department of Econometrics and Statistics at Nicolaus Copernicus University in Toruń. Specialist in econometrics, finance, data science, and management accounting. Focused on optimizing production and logistics processes. Conducts research in AI development and applications. Actively promotes machine learning and data science in business environments.