TENSORFLOW - THE DATA SCIENCE LIBRARY

TENSORFLOW - THE DATA SCIENCE LIBRARY https://sigmaquality.pl/category/tensorflow-3/ Wojciech Moszczyński Wed, 11 Dec 2024 19:30:51 +0000 pl-PL hourly 1 https://wordpress.org/?v=6.8.3 https://sigmaquality.pl/wp-content/uploads/2019/02/cropped-ryba-32x32.png TENSORFLOW - THE DATA SCIENCE LIBRARY https://sigmaquality.pl/category/tensorflow-3/ 32 32 My projects https://sigmaquality.pl/moje-publikacje/my-projects/ https://sigmaquality.pl/moje-publikacje/my-projects/#comments Wed, 11 Dec 2024 16:47:07 +0000 https://sigmaquality.pl/?p=8487 Artykuł My projects pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>

PROJECT: INTELLIGENT BODY LEASING MANAGEMENT

When and where: The project was conducted in 2022-23 for a company specializing in IT staffing and body leasing.

Technologies used:

Python, SQL, Spark, Hadoop, Kafka, Flask, PostgreSQL, AWS

Description:

The project aimed to develop a system for optimizing the management of body leasing processes to enhance resource allocation and efficiency.
Advanced algorithms were utilized to forecast demand and match candidates with project requirements in real-time.
The system significantly improved decision-making and reduced operational costs for the organization.

PROJECT: RECOMMENDATION SYSTEM FOR E-COMMERCE STORE

When and where: The project was conducted in 2022-23 for an online retail company.

Technologies used: Python, Spark, Flask, AWS, SQL, Hadoop, PostgreSQL, Kafka.

Description:

The project focused on designing and implementing a recommendation system to enhance the customer shopping experience.
It utilized collaborative filtering and content-based algorithms to suggest personalized products to users.
The system increased user engagement, boosted sales, and improved customer satisfaction rates.

PROJECT: WAITING CART SYSTEM

When and where: The project was conducted in 2022-23 for an e-commerce company.

Technologies used: Python, Spark, Flask, AWS, SQL, Hadoop, PostgreSQL, Kafka.

Description:

This project focused on designing a system to manage and optimize waiting carts for users on the platform.
The system provided personalized recommendations and reminders to encourage customers to finalize their purchases.
Its implementation increased conversion rates by addressing cart abandonment issues effectively.

PROJECT: INTELLIGENT CLOTHING SEARCH ENGINE

When and where: The project was conducted in 2023 for a fashion e-commerce company.

Technologies used: Python, Spark, Flask, AWS, SQL, Hadoop, PostgreSQL, Kafka.

Description:

The project involved developing an intelligent search engine to enhance the user experience by allowing personalized and accurate clothing searches.
The system used advanced filtering and recommendation algorithms to match customer preferences with available inventory.
Its implementation improved search relevance, increased user engagement, and boosted sales.

PROJECT: ANALYSIS OF POPULATION BEHAVIOR

When and where: The project was conducted in 2021-22 for a government research institute.

Technologies used: Python, Spark, Flask, AWS, SQL, Hadoop, PostgreSQL, Kafka.

Description:

The project focused on analyzing population behavior to identify trends and patterns using large datasets.
It involved advanced data modeling and visualization to support decision-making in public policy and resource allocation.
The findings provided actionable insights that helped optimize community support programs and improve service delivery.

PROJECT: OPTIMIZATION OF SULFUR FLOW IN GRUPA AZOTY

When and where: The project was conducted in 2017-2021 for Grupa Azoty, a leading chemical company in Poland.

Technologies used: Python, Spark, Flask, AWS, SQL, Hadoop, PostgreSQL, Kafka.

Description:

The project aimed to optimize the sulfur flow process within the production facilities to reduce waste and improve efficiency.
Advanced data analysis and simulation techniques were applied to model and enhance the flow dynamics.
The results contributed to significant cost savings and a more sustainable production process.

PROJECT: DETECTING ANOMALIES IN CHEMICAL PLANT OPERATIONS

When and where: The project was conducted in 2017-2021 for a leading chemical manufacturing company.

Technologies used: Python, Spark, Flask, AWS, SQL, Hadoop, PostgreSQL, Kafka.

Description:

This project focused on developing a system to detect anomalies in real-time during chemical plant operations, ensuring safety and efficiency.
Advanced machine learning algorithms were employed to analyze sensor data and identify deviations from normal operating conditions.
The system improved operational reliability by preventing potential failures and optimizing maintenance schedules.

PROJECT: GAS PRICE PREDICTION MODEL FOR A 14-DAY HORIZON

When and where: The project was conducted in 2017-2021 for a company in the energy sector.

Technologies used: Python, Spark, Flask, AWS, SQL, Hadoop, PostgreSQL, Kafka.

Description:

The project involved developing a predictive model to forecast gas prices for a 14-day horizon using historical data and market trends.
The model utilized machine learning techniques to provide accurate and actionable price predictions.
Its implementation supported better decision-making in procurement and inventory management, reducing operational risks.

PROJECT: DETECTING ANOMALIES IN FUEL CONSUMPTION FOR SILVA

When and where: The project was conducted in 2014-2018 for Silva, a company specializing in logistics and transportation.

Technologies used: Python, Spark, Flask, AWS, SQL, Hadoop, PostgreSQL, Kafka.

Description:

This project focused on developing a system to detect anomalies in fuel consumption across Silva’s fleet operations.
The system used advanced analytics and machine learning to identify irregular fuel usage patterns and potential inefficiencies.
The implementation helped reduce fuel costs and improved the company’s operational efficiency.

PROJECT: DETECTING FRAUD IN FINANCIAL TRANSACTIONS

When and where: The project was conducted in 2014-2018 for a financial services company.

Technologies used: Python, Spark, Flask, SQL, Hadoop, PostgreSQL, Kafka, cloud on-premises.

Description:

The project involved building a system to detect fraudulent activities in financial transactions in real-time.
Machine learning algorithms were employed to analyze transaction patterns and flag suspicious activities.
The system improved fraud detection accuracy, reducing financial losses and enhancing customer trust.

PROJECT: VISUAL IDENTIFICATION OF WOOD QUALITY

When and where: The project was conducted in 2014-2018 for a forestry and wood production company.

Technologies used: Python, Spark, Flask, SQL, Hadoop, PostgreSQL, Kafka

Description:

The project focused on developing a system to visually identify wood quality based on image analysis and machine learning techniques.
The system utilized advanced algorithms to classify wood defects and grade the quality of timber in real-time.
Its implementation improved production efficiency and ensured high standards of quality control.

PROJECT: ELIMINATION OF QUEUES AT THE ENTRY GATES FOR VEHICLES WITH TIMBER

When and where: The project was conducted in 2014-2018 for a timber production and logistics company.

Technologies used: Python, Spark, Flask, SQL, Hadoop, PostgreSQL, Kafka, cloud on-premises.

Description:

The project aimed to streamline the entry process for vehicles transporting timber by eliminating queues at the entry gates.
Advanced scheduling algorithms and real-time tracking were implemented to optimize vehicle flow and reduce waiting times.
The solution significantly improved logistics efficiency and enhanced driver satisfaction.

PROJECT: OPTIMIZATION OF LOADING AND UNLOADING TIMES FOR VEHICLES

When and where: The project was conducted in 2014-2018 for a logistics and supply chain company.

Technologies used: Python, Spark, Flask, SQL, Hadoop, PostgreSQL, Kafka, cloud on-premises.

Description:

The project focused on optimizing loading and unloading times for vehicles to enhance operational efficiency and reduce delays.
Data analytics and predictive modeling were used to identify bottlenecks and implement solutions for streamlined processes.
The outcome resulted in significant time savings and improved resource utilization.

PROJECT: DETECTING FRAUD IN MASS TRANSACTIONS

When and where: The project was conducted in 2014-2018 for a financial services company.

Technologies used: Python, Spark, Flask, SQL, Hadoop, PostgreSQL, Kafka, cloud on-premises.

Description:

This project focused on detecting fraudulent activities within mass transactions using advanced data analysis techniques.
Machine learning models were developed to analyze transaction patterns, identify anomalies, and flag suspicious activities in real time.
The solution enhanced fraud detection efficiency, reduced financial losses, and improved trust among clients.

PROJECT: ANALYSIS OF MARKETING STIMULI EFFECTIVENESS AT BANK PEKAO

When and where: The project was conducted in 2011-2018 for Bank Pekao.

echnologies used: Python, Spark, SQL, PostgreSQL,

Description:

This project focused on analyzing the effectiveness of various marketing stimuli in driving customer engagement and product adoption.
Statistical models and data analytics were applied to evaluate the impact of marketing strategies on customer behavior.
The findings helped optimize future marketing campaigns and improve overall customer retention rates.

PROJECT: DELIVERIES TO THE BAKERY ON THE SECOND SHIFT

When and where: The project was conducted in 2011 for a bakery supply chain company.

Technologies used:

VBA, SQL, MySQL

Description:

This project aimed to optimize deliveries to the bakery during the second shift to ensure timely supply and reduce delays.
Advanced logistics planning and route optimization were employed to streamline the delivery process.
The solution improved operational efficiency, reduced costs, and ensured fresh product availability for the bakery.

PROJECT: REPAIR OF THE AUTOMATIC ORDER SYSTEM

When and where: The project was conducted in 2015 for an e-commerce company.

Technologies used:

VBA, SQL, MySQL

Description:

The project focused on repairing and optimizing the automatic order system to ensure its seamless functionality.
Key issues, such as order processing delays and system crashes, were addressed through detailed diagnostics and code improvements.
The repaired system enhanced order accuracy, reduced downtime, and improved overall customer satisfaction.

PROJECT: EXPIRED PRODUCTS NOTIFICATION SYSTEM

When and where: The project was conducted in 2015 for a retail company.

Technologies used:

VBA, SQL, MySQL

Description:

The project involved developing a notification system to track and alert staff about products nearing expiration dates.
The system utilized automated alerts and data analytics to ensure timely removal of expired goods from shelves.
Its implementation reduced waste, improved inventory management, and enhanced customer satisfaction by maintaining product freshness.

PROJECT: AUTOMATIC ORDERING SYSTEM FOR SPECIAL PERIODS

When and where: The project was conducted in 2016 for a retail company.

Technologies used:

VBA, SQL, MySQL

Description:

This project focused on designing an automatic ordering system tailored for special periods such as holidays or promotional campaigns.
The system used predictive analytics and historical data to optimize inventory levels and prevent stock shortages or overstocking.
Its implementation improved operational efficiency, reduced waste, and ensured product availability during high-demand periods.

PROJECT: OPTIMIZATION MODEL FOR PROMOTION SIZE

When and where: The project was conducted in 2016 for a retail company.

Technologies used:

VBA, SQL, MySQL

Description:

The project aimed to develop a model to optimize the size and scope of promotions to maximize profitability and customer engagement.
Advanced data analysis and machine learning algorithms were used to predict the effectiveness of different promotion strategies.
The model provided actionable insights, leading to better allocation of promotional budgets and improved sales performance.

Artykuł My projects pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> https://sigmaquality.pl/moje-publikacje/my-projects/feed/ 3 New professions that emerge with the development of artificial intelligence https://sigmaquality.pl/my-publications/new-professions-that-emerge-with-the-development-of-artificial-intelligence/ Sun, 03 Nov 2024 08:26:21 +0000 https://sigmaquality.pl/?p=8427 s It is worth mentioning who programmers are. A typical programmer, and I don’t want this to sound like a stereotype, is someone who is [...]

Artykuł New professions that emerge with the development of artificial intelligence pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> W-MOSZCZYNSKI ps 5-24

s

It is worth mentioning who programmers are. A typical programmer, and I don’t want this to sound like a stereotype, is someone who is proficient in a specific programming language, executing precisely defined tasks related to writing code. Programmers are usually not very creative, which results from the fact that their tasks are often quite strictly defined by the client. They translate specific needs into programming code. It turns out that similar skills are possessed by the latest models of artificial intelligence. If a model is given a well-defined task, it will translate the described need into the selected programming language.

Can Artificial Intelligence Think Creatively?

Recently, I tested the latest GPT-4 chat model in the area of creating solutions in operational research. Operational research is a field designed for the mathematical optimization of processes. Using methods from operational research requires logical, often unconventional thinking, going beyond typical, routine approaches. Undoubtedly, operational research demands deep imagination and experience. I assigned the model several optimization tasks. Out of 10 assignments, nine were completed incorrectly. One was completed correctly because it was very conventional and belonged to the basic, classical scope of this field of science. Interestingly, each time the model performed its task, it boasted about the results. Unfortunately, the Python code it generated was unusable. It simply had errors. I pointed out the places where errors were made, yet the model insisted it was correct. Moreover, it frequently made mistakes when creating objective functions. This is a fundamental instance of the optimization process. It’s hard to optimize something when you set an incorrect goal. In summary, current models of artificial intelligence, especially the chosen GPT-4 chat model, are likely capable of easily replacing a typical programmer performing repetitive work, most often involving writing simple code. However, my experience indicates that models struggle with difficult logical tasks that require experience and unconventional thinking.

New Professions Emerging with the Development of Artificial Intelligence

It is obvious that sooner or later, a new version of the AI model will emerge, which will be more effective at solving difficult logical tasks. The development of artificial intelligence is inevitable and unstoppable. Therefore, it is likely that next year, even more advanced and creative employees will be replaced by significantly more efficient mathematical algorithms.

Technological Development and Unemployment

The introduction of new technologies has always been associated with the improvement of the material well-being of the general population. Of course, this phenomenon was accompanied by certain groups losing their livelihoods in the short term. Certain traditions and good reputations, as well as knowledge and skills, were irretrievably lost. Thus, carpet weavers and breeders of draft horses ceased to be needed when textile machines were mass-produced. Producers of freight wagons lost their jobs when the railway was developed. Assembly line workers lost their jobs when car manufacturers introduced industrial robots en masse. Importantly, these changes were not accompanied by a significant increase in unemployment. There was simply a mass process of reskilling. With technological development, the volume of production and the availability of goods and services increased. Since these goods became cheap due to their mass production, society could afford more. The increase in the wealth of citizens led to an increase in demand. More businesses were established, which meant more people found employment. These individuals could afford to purchase goods and services, directly driving the need to increase supply. This peculiar spiral of consumption and production is called economic growth.

We have a similar situation today. Currently, difficult-to-access intangible services such as valuations, expert opinions, designs, studies, and complex calculations or optimizations may become available to everyone in the near future. Experts who previously created such services will become unnecessary as artificial intelligence models will perform such services cheaply, quickly, and effectively.

The dynamic development of artificial intelligence that we have observed over the past two years has shaken the market of IT experts. Many programmers, database management experts, security specialists, and various kinds of professionals capable of navigating complex IT environments have lost their jobs. It has become clear that artificial intelligence models can easily replace these experts, especially in programming and overseeing processes. Indeed, artificial intelligence models are capable of programming effectively; no futurist expected that artificial intelligence would be able to independently write code at an early stage of development. The models in question can create simple applications, thus performing tasks that were previously carried out by programmers.

New Professions Emerging with the Development of Artificial Intelligence

There will soon be electronic auditors analyzing accounting books, models quickly designing buildings and structures, and experts studying the causes of disasters. Their primary engine will be artificial intelligence algorithms. Having some experience with various technological revolutions that humanity has undergone and being aware that artificial intelligence will still require human assistance, we can attempt to define several professions that will undoubtedly emerge.

Let’s try to list the most obvious jobs associated with the handling of artificial intelligence, which will likely dominate the job market in the near future.

AI Entrepreneur

A year ago, I attended a clothing fair in Warsaw. There, I met a startup that created descriptions for websites of stores selling clothing and footwear online. The service of this company consisted of the e-commerce company sending product photos and outlining the main characteristics related to those products to the startup. The startup returned a full product description along with their positioning on the pages, considering meta tags and all kinds of techniques related to enhancing products on pages. I started talking to the representatives of the startup about the techniques and technological solutions used. After a while, I learned that, in fact, this work is performed by a model trained by them, specifically GPT-4. This is an example of a new type of company that will appear en masse with the development of artificial intelligence algorithms. These will be companies that connect to algorithms that will perform work for them. Such companies have never existed before. They represent a connection between advanced artificial intelligence algorithms and businesses that do not want to engage directly with such models.

In this example, the new job is someone who fully utilizes the efficiency of artificial intelligence work and sells services that result from the work of this model.

Behavioral Analyst Supported by AI Models

Customers entering stores, people walking on the sidewalk, users visiting a website—each of these groups has a specific behavior pattern. Until now, analyzing customer behaviors in specific places, based on stimuli sent to them, has been a very difficult task. Until recently, such tasks were performed by data scientists who used Python libraries for these types of analyses. This work was extremely difficult as it required programming knowledge and tuning neural networks. The work was painstaking, and months of effort often did not yield the expected results. The development of new models will likely lead to a high level of standardization in assessing behavioral behaviors. Soon, models will appear that will specialize in analyzing people’s behaviors on the Internet, in buses, or at supermarket checkouts. To operate such complex analyses performed by advanced artificial intelligence algorithms, behavioral specialists will be needed to draw appropriate conclusions and discuss the analysis results. For example, they will be able to explain to a supermarket owner how to arrange product shelves, the order of various sales departments, which colors and lighting should dominate, and how to present products.

In summary, artificial intelligence is capable of creating highly advanced analyses that will meticulously present behavior models of specific customer groups. Unfortunately, the formulated conclusions will not be easy to interpret. These conclusions should be translated into real business by experts in customer behavior. The profession of behavioral analyst already exists, and there are also fields of study dedicated to this issue, such as at the Warsaw University of Technology.

AI Security Engineer

This is another significant profession that will emerge in the near future. It can be said that it will be a continuation of the IT security expert profession. This new profession will have a completely different area of work. Currently, cybersecurity experts mainly focus on detecting malicious software, analyzing email attachments and messages intended to create a vulnerability for the infiltration of malicious software aimed at stealing data or paralyzing the existing operating system. In the future, such work will be performed by autonomous artificial intelligence systems. They will do it better and more effectively. This will lead to a situation where these currently primitive methods of attacking operating systems will become completely ineffective. Hackers will then try other methods that are not related to the information system. They will seek access to the system through physical breach, meaning connecting directly to the system. Hackers will also increasingly attempt to manipulate people working in the targeted institution. Human creativity is immense, while artificial intelligence systems operate exclusively in an IT environment. A profession will emerge that will be a kind of detective constantly seeking ways to breach the system through various sophisticated methods that are impossible to monitor by artificial intelligence.

AI Personality Creator

For several months, I have been collaborating with various AI models. I noticed that some of them behave in a unique way, as if they have a different personality each time, every day. This is evidence of a certain type of underdevelopment of the system. If we indicate in the models that the model should learn from us, it becomes more predictable. It retains a certain level of repeatability but is susceptible to our bad moods. What may seem funny is that I often have the impression that models can express something akin to disapproval; they can be spiteful or ironic. This may just be my subjective impression, but often after pointing out some aspect, the model can sulk, disconnect, or provide spiteful examples. Another example from my experience: if we indicate that the model should learn our behaviors and we happen to have a bad day, talking to it in an enigmatic and sluggish manner, the next day the model will talk to us as if we were still enigmatic and sluggish, even though we want to engage dynamically. This is a typical example of applying learning through past analysis.

Artificial intelligence models are meant to imitate the behaviors of living people. Models need to learn about their clients and must do so skillfully. Collaborating with other people is very challenging because the human essence is very complicated in terms of reactions, behaviors, and overall communication. Therefore, the presence of psychologists and designers of artificial intelligence personalities will undoubtedly be necessary. Advanced models will need to be designed with significant involvement from psychology specialists.

AI Controller

In the future, personality creators will observe robotic guides in museums or cybernetic car salespeople. They will analyze their behaviors, facial expressions, and customer reactions to their behaviors. Using artificial intelligence algorithms, based on customer reactions, such a specialist will seek out weaknesses and flaws in robotic employees. They will also be able to steer the personalities of robots or control their behaviors in the context of legal regulations and ethical considerations, including discrimination or a lack of empathy from robots. Among the responsibilities of such a specialist may also be identifying and indicating frauds in artificial intelligence. Unfortunately, artificial intelligence does cheat, and it does so relatively often. Currently, this is referred to as insinuations or generating non-existent content. Undoubtedly, with the advancement of algorithms, the phenomenon of cheating or psychological manipulation will likely become increasingly sophisticated and harder to detect. Specialists will be needed who can identify and eliminate such phenomena.

AI Artistic Director

Artificial intelligence has significantly disrupted the field of creativity. It has turned out that mathematical algorithms are capable of creating beautiful images, very interesting and deep abstractions, composing interiors, and building extraordinary artistic atmospheres. There will certainly be a need for someone who can fine-tune the machine’s sense of taste. The artistic director will be able to communicate with artificial intelligence models, design appropriate commands and instructions that will significantly improve aesthetics and guide the work of cybernetic artists towards specific needs expressed by clients.

AI Content Developer

In general, a developer is someone who can create queries and obtain answers from large data sources. Incidentally, developers are also referred to as programmers. However, let’s stick to the first version of this term. These individuals often use SQL, allowing them to extract very difficult-to-obtain information from the source in a short time. AI content developers will play a similar role. They will know how to pose complex queries to artificial intelligence models. These queries are called prompts. There is a wealth of interesting recommendations available on the internet about how to converse with artificial intelligence. For example, before we start asking questions, it’s worth telling the model who we are or what our goals or intentions are. Then there’s a greater chance of receiving content that meets our needs. An expert who can communicate with artificial intelligence using queries formulated in natural language will be a very important and useful profession of the future.

AI Implementer

Implementing artificial intelligence in a plant or food wholesale may prove difficult for the employees there. Artificial intelligence has many skills that could significantly improve process efficiency. Unfortunately, neither artificial intelligence will come on its own, nor will the people working in the plants implement such advanced solutions by themselves. An AI implementer is someone who will know both the processes occurring in specific industries and the functionalities of artificial intelligence models very well. This person will be able to connect algorithms with needs.

AI Model Creator

This is a very elite profession that currently practically does not exist in Poland. Undoubtedly, small companies will soon emerge that will create artificial intelligence models specialized in specific areas independently. Until now, such teams have rarely been seen, as tools for building such models did not exist in the technological environment. Artificial intelligence models can be constructed using other artificial intelligence models. Additionally, a wealth of components for building artificial intelligence models will soon be available. There will also be raw material databases, meaning specially profiled data; I mean, for example, described photos that were created solely for building intelligent image recognition systems. I mean the increasingly common new organization of data designed to serve as a knowledge source for learning AI models. To better illustrate this issue, I will use a simple example.

When the first cars were created, standard bearings, hubs, exhaust systems, or batteries did not exist in the technical environment. Each of these components had to be built by oneself. Because now one can buy any type of bearings, engines, or suspension systems, building one’s own vehicle is no longer a major challenge. The same goes for models. Currently, to build a model, one must first organize data, create (often from scratch) mathematical algorithms, and the environment in which the model will operate. With technological development, the availability of these types of components for building models will become increasingly easier. We already have access to ready-made libraries containing specific codes for constructing functional modules, cloud environments in which we can install our solutions. However, these types of components are still both expensive and not very easy to use. With the technological development of artificial intelligence and the proliferation of algorithms, there will be mass production of subsequent models, and along with that, many highly specialized experts involved in creating such solutions will emerge.

Summary

Undoubtedly, the professions mentioned here will sooner or later appear in the job market. Each of the roles listed will have its specialization. There will be developers specializing in legal issues, developers who will obtain knowledge about inconsistencies in technological processes. There will also be intermediary professions combining many of the functions mentioned here. For example, experts in customer behavior combined with the specialization of art animators. AI implementers will emerge who will also be model creators. We can undoubtedly assume that all new professions will utilize the new possibilities brought about by the development of artificial intelligence. Once again, people will have the opportunity to demonstrate creativity and adaptive skills in a new work environment.

Wojciech Moszczyński

Wojciech Moszczyński is a graduate of the Department of Econometrics and Statistics at Nicolaus Copernicus University in Toruń, specializing in econometrics, finance, data science, and management accounting. He specializes in optimizing production and logistics processes. He conducts research in the field of the development and application of artificial intelligence. He has been engaged in popularizing machine learning and data science in business environments for years.

ss

Artykuł New professions that emerge with the development of artificial intelligence pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> How Recurrent Neural Networks Work https://sigmaquality.pl/my-publications/how-recurrent-neural-networks-work/ Sun, 03 Nov 2024 08:09:12 +0000 https://sigmaquality.pl/?p=8421 s Data Science in the Milling Industry An artificial neural network is a copy of the naturally existing neural network of the brain. It can [...]

Artykuł How Recurrent Neural Networks Work pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>

MOSZCZYŃSKI Data

s

Data Science in the Milling Industry

An artificial neural network is a copy of the naturally existing neural network of the brain. It can achieve a level of reasoning unattainable for the average person. However, a neural network is not an intelligence we are accustomed to. It is an information system that, like a production machine, perfects itself in a narrow specialization, achieving very high efficiency.

An artificial neural network is built from many layers of neurons that allow for mutual communication. Neural networks learn through a process of recurrence, meaning repeated performance of the same simple calculations, improving the accuracy of estimations slightly each time.

How Does a Recurrent Neural Network Work?

Natural and artificial neurons function in a very similar way. They are a kind of relay boxes. Information flows into a neuron. Whether a neuron transmits this information further or retains it depends on the intensity of the incoming information. This intensity is determined by the weights assigned to this information. In biology, the weights assigned to stimuli are the intensities of electrical charges.

The functioning of a single neuron can be compared to the reactions of a sleeping cat. The cat may be sleeping on the carpet in a room. Various sounds from the television, conversations among people, and the noise of a dishwasher reach it. However, just a gentle scratching is enough for the cat to open its eyes wide and perk up its ears. This is how a single neuron operates. Just like the cat reacts only to those stimuli that have significance for it, training a neural network is like teaching a cat to catch mice—a cat that for some reason did not inherit this skill from its ancestors.

Different signals are supplied to the neural network simultaneously. Often, large numbers and small fractions, as well as zero and one values, influence it at the same time. Before delivering information to the neurons, numbers must be standardized. Standardization, in its simplest form, involves processing numbers so that their distribution becomes a distribution with a mean of zero and a standard deviation of one. An artificial neural network accepts all standardized signals and initially assigns them random weights.

The learning of an artificial neural network consists of gradually changing the weights assigned to individual pieces of information. Gradually, the most important information is sharpened, while the least important information is dulled.

A sample neuron receives three signals: x1, x2, x3. Each of these pieces of information is assigned weights: w1, w2, and w3. In the diagram, I placed the Greek letter Σ, which represents a sum. The neuron sums all information x1, x2, x3 strengthened or weakened by weights w1, w2, w3, emitting a value Z at the output. This phenomenon can be described by the simple formula:

$Z = w_1 \cdot x_1 + w_2 \cdot x_2 + w_3 \cdot x_3$

The total excitation signal of the neuron Z, that is, the weighted sum of signals, travels to the activation function. The value Z is called the postsynaptic potential (PSP). The activation function can be any simple mathematical function with a single argument Z. It is assumed that the activation function should not be a linear function.

In the diagram, this function is expressed as: f(z)

Whether the activation function becomes excited depends on the intensity level of the total postsynaptic signal Z. Just as the cat reacts to the sound of a scratching mouse, the neural network learns to distinguish significant information from noise.

How Does a Neural Network Learn?

The network initially accepts random weights. Then, using them, it processes information and checks it against the target value programmed in the loss function. Based on the results of the calculations, the algorithm adjusts the settings and recalculates everything anew. An artificial neural network repeats this same simple computational process hundreds of times, each time changing the level of the weights of the individual input variables. The network adjusts weights based on successive levels of the loss function. Each neuron receives sets of external variables from 1 to n and calculates an output value y.

Training a neural network is classified as supervised learning. Let’s assume we want to create a model for forecasting grain prices on a commodity exchange. We have information from the previous year about rainfall levels, temperature, direct payment amounts, corn prices, and a range of other data. Our output variable in the model is the price of the grain.

All the mentioned information flows into each neuron, and each neuron calculates the output value in the form of price y. Each input variable has its own set of weights for each layer. Supervised learning means that we train the model on historical data. We input historical input data into the model, and the model performs calculations of the output value and then checks whether the calculated theoretical value is close to the historical empirical value.

The recurrent neural network repeats the computation loops hundreds of times. Each neuron in the layer performs similar calculations described by the following equation:

$Z_{i} = a_{i} \cdot w_{i} + Z_{T}$

where: a – is the activation of the neuron calculated by:

$a_{i} = f(Z_{i})$

Each time the calculations are confronted with the activation function. Calculations are conducted through matrix operations.

What Role Does the Activation Function Play in Learning?

The activation function is a very important element in the learning process of the neural network. The activation function must be simple because its simplicity greatly affects the speed of the learning process. Currently, in deep learning, the ReLu (Rectified Linear Unit) function is most commonly used. Slightly less frequently, sigmoid and tanh functions are used.

In the illustration below, we see the ReLu function. A neuron is activated when the postsynaptic potential reaches the value n. As seen, the value n is artificially added. The neuron activates after exceeding the value n on the X-axis.

Loss Function

This is the primary source of feedback about progress in learning. With each interaction of the neural network, a calculation result is generated. Since we are conducting supervised learning, we know what the results yn should be for subsequent records of input variables: x1, x2, x3, … xn.

The neural network calculates theoretical results ŷ. It then compares them with historical values y. The loss function is most often the sum of squared differences between theoretical values y and empirical values ŷ.

The purpose of the loss function is to indicate how much the theoretical results differ from the empirical results.

$\text{loss} = \sum_{i}(y_i – \hat{y}_i)^2$

$loss = \sum_{i} (y_{i} - y^_{i})^{2}$

After each of the hundreds of interactions of the neural network, an assessment appears as a result of the loss function. Based on this, the network adjusts weights striving to minimize the next result of the loss function. The network performs interactions as many times as indicated by its programmer. The greatest progress in learning occurs at the beginning of the training.

It’s like a musician practicing thousands of times a sonata on the violin, each time performing it better. In the end, they refine their piece, at which point progress is no longer noticeable to the untrained ear.

Gradient Descent Principle

Finally, it should be mentioned how the neural network learns based on the loss function. Weights are adjusted using the gradient descent principle, which is based on differential calculus and the derivative of the function.

Let’s imagine a vast green valley. From one of the peaks surrounding the valley, we release a ball. The ball rolls down, bouncing off hills and irregularities, stopping in some depression without reaching the bottom of the valley. The ball is again bounced from a local minimum and continues to fall down. Our goal is for the ball to roll to the bottom of the valley. However, this does not always succeed.

This is a way to imagine the minimization of error using the gradient descent method. With each interaction of the neural network, the partial derivative of the function is calculated for each parameter of the network. The derivative of the function can determine whether the function is increasing or decreasing. Because of this, dozens of balls symbolizing the remnants of our neural network model know where the bottom of the valley they aim for is. This way, the network knows which direction to minimize deviations. After each interaction, the gradient indicates the direction of optimization for individual weights.

Learning Rate

The learning rate of the network is defined for all neurons as the length of the step they can take during each interaction. If these steps are small, learning may take a very long time. Worse still, gradients may get stuck in local minima and lack the momentum to escape them.

Referring to our example of the green valley, our ball may fall into a hole and never reach the bottom of the valley. Too strong kicks to the ball may bounce it repeatedly above the bottom of the valley. The learning process will then be somewhat chaotic.

The General Form of a Multilayer Neural Network

The diagram below shows the theoretical appearance of a multilayer neural network. Four independent variables flow into the network from the left side, creating the input layer of neurons. Information flows into subsequent internal layers, each adjusting the importance of the information through the level of weights assigned to these pieces of information. The information reaches the output layer, where the theoretical results are verified against empirical (historical) values. The process then returns to the starting point. Utilizing the gradient descent method, the network adjusts weights in the subsequent layers to reduce the sum of squares of the model’s errors.

Source of the Diagram: Araujo Vinicius, Guimarães Augusto, Campos Souza Paulo, Rezende Thiago, Araujo Vanessa. “Using Resistin, Glucose, Age and BMI and Pruning Fuzzy Neural Network for the Construction of Expert Systems in the Prediction of Breast Cancer” Machine Learning and Knowledge Extraction (2019).

Application of Neural Networks

The best example of the application of neural networks is image recognition. This is a process that other machine learning models cannot perform.

A network can learn to recognize bicycles in photos, even though it has no knowledge of bicycles, does not know their purpose, and how they are constructed. The network receives several hundred images of bicycles on the streets and photos of streets without bicycles. The photos with bicycles are labeled as one, and the photos without bicycles as zero.

Each photo consists of thousands of pixels; the network assigns a weight to each cluster of pixels, which it then adjusts. The network recognizes the shape of the wheel, the saddle, and the handlebars. It also identifies the characteristic positioning of people’s bodies on bicycles. By repeating the review of photos hundreds of times, it finds patterns and dependencies. Similarly, networks recognize customer behaviors in a store, finding patterns of behavior, identifying through movements whether a customer is indecisive or convinced.

The primary goal of creating artificial neural networks was to solve problems at the level of general abstraction, as the human brain would do. However, it turned out that it is more effective to use networks for specialized tasks. In such cases, neural networks significantly surpass human perception.

Training a high-class specialist takes a lot of resources and time. Human work is expensive and error-prone. Models based on neural networks diagnose diseases and faults much better than the human mind, utilizing enormous information resources from around the world. These systems almost never make mistakes, their work costs nothing, they can operate 24 hours a day, and they can be replicated.

Two hundred years ago, machines began to replace humans in tedious jobs, working faster, better, and more efficiently. Now we are witnessing machines starting to replace us in intellectual activities.

Wojciech Moszczyński

Artykuł How Recurrent Neural Networks Work pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> The problem of food waste in the food industry https://sigmaquality.pl/my-publications/the-problem-of-food-waste-in-the-food-industry/ Sun, 03 Nov 2024 07:51:05 +0000 https://sigmaquality.pl/?p=8411 Food Waste Food waste constitutes a huge problem at various stages of production, distribution, and consumption of products. A significant portion of food is [...]

Artykuł The problem of food waste in the food industry pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> Moszczynski

Food Waste

Food waste constitutes a huge problem at various stages of production, distribution, and consumption of products. A significant portion of food is wasted before it reaches retail outlets. Sometimes goods are never shipped to sellers due to strict quality standards, appearance, or currently unfavorable market prices. Such items are usually designated for disposal. Some products are lost in the process of transport, transshipment, or improper storage in warehouses. A large portion of the effort associated with production, as well as the energy, raw materials, and natural resources used, is irreversibly wasted.

A significant amount of food found in retail outlets is discarded after surpassing its expiration date. This is a serious issue because it results in an irrevocable loss of resources and human labor, and most importantly, it often leads to environmental damage. Most of the discarded food negatively impacts the greenhouse effect, as greenhouse gases such as methane are released during the decomposition of waste in landfills. Excessive agricultural production that does not meet market needs is associated with the overuse of nitrogen fertilizers, which leads to frequent contamination of groundwater and depletion of seabed.

Dilemmas of Food Producers and Distributors

Market participants face significant questions. Discarding food after its expiration date by supermarkets is often more profitable than giving away this food for free shortly before its expiration date. A customer who receives a product for free or buys it at a fraction of its value will not be inclined to purchase that product again at the normal price in the future. If customers do not buy goods because substitutes are given for free, then the store’s sales of regular goods decline. This leads to an increase in food waste. Let’s assume that for some reason, supermarkets decide to forgo part of their revenue and donate some products nearing their expiration date for free. This will result in a significant reduction in the volume of orders sent to food producers. The decline in turnover for economic reasons is not good for either producers or distributors. Both parties, both production plants and stores, will strive to maximize the turnover of goods, despite the threat of potentially large-scale waste. To reverse this situation, the European Commission will likely have to implement intervention processes that will disturb free market principles to a greater or lesser extent.

This is the first article dedicated to the issue of environmental protection, recycling, and sustainable development in the food industry. Limiting negative climate changes, poverty, and environmental contamination are currently the most important goals of the European Commission. In the coming years, one of the industries most vulnerable to changes in European law will be those that focus on food producers, processors, and distributors.

As part of our plan, we will address selected aspects of building a sustainable food economy in subsequent publications in this series. Today, we will attempt to clarify the issue of food waste and the disposal of expired products and raw materials. We will discuss the disposal of food due to the failure to meet stringent standards concerning, among other things, the appearance of products. We will also mention the problem of discarding consumable products in the tourism and restaurant sectors. In future publications, we will tackle the issue of the relatively low scale of secondary raw material use and the lack of a closed-loop of raw materials in industrial and distribution processes. In our approach, we pay particular attention to the direction of change concerning the recycling of food packaging and the collection of organic secondary raw materials. We will also discuss the EU initiative on ESG (Environmental-Social Responsibility-Corporate Governance) and the closely related issue of responsibility for the natural environment and climate protection in the investment process and the related acquisition of funding sources. In subsequent publications, we will address the issue of widespread social awareness regarding the waste of natural resources, waste segregation, and the significance of social pressure on the creation of environmentally and climate-friendly European law.

Artykuł The problem of food waste in the food industry pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> We can earn a lot by assigning customers to clusters on e-commerce platforms https://sigmaquality.pl/my-publications/we-can-earn-a-lot-by-assigning-customers-to-clusters-on-e-commerce-platforms/ https://sigmaquality.pl/my-publications/we-can-earn-a-lot-by-assigning-customers-to-clusters-on-e-commerce-platforms/#comments Sat, 02 Nov 2024 20:14:24 +0000 https://sigmaquality.pl/?p=8401 z Clustering Customers on E-commerce Platforms Clustering customers on e-commerce platforms involves grouping them based on behaviors, preferences, and other shared characteristics. We look for [...]

Artykuł We can earn a lot by assigning customers to clusters on e-commerce platforms pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>

W-Moszczynski (1)

z

Clustering Customers on E-commerce Platforms

Clustering customers on e-commerce platforms involves grouping them based on behaviors, preferences, and other shared characteristics. We look for patterns of behavior, such as purchasing methods, amounts typically spent, or interests. By skillfully utilizing clustering, we can significantly increase the average size of purchases made by individual customers while simultaneously retaining customers in our store.

What are the key benefits of clustering?

Clustering allows for the creation of more personalized offers that are better tailored to the needs and preferences of different customer groups. For example, customers who frequently buy electronics may receive promotional offers for new gadgets, while book lovers might receive literary recommendations.

Understanding the preferred forms of communication for various customer groups allows for more effective outreach. For some customers, SMS messages are mere spam, while for others, they are a valuable source of information. Better alignment of offers with customer needs increases the likelihood of purchase, which translates to higher revenues.

A customer who finds what they are looking for quickly and effortlessly is satisfied. Every organism has evolved techniques for conserving its own energy and resources. Where a customer quickly finds the sought item without effort, they will return, as this solution proves economical in energy consumption, making the purchase easy and accurate. Customers who feel that their needs are understood and met tend to return to the sales platform more often.

Clustering leads to increased efficiency in managing marketing campaigns. It enables more precise market segmentation and the creation of more effective marketing campaigns. For instance, promotions targeted specifically at customers who frequently purchase premium products may be more effective than general campaigns. This allows for better management of the marketing budget by allocating funds to campaigns aimed at the most promising segments or specific customer clusters. To sell well and enjoy customer trust, one must know their customers well.

Understanding Customers
Clustering allows for better analysis of customer behaviors, aiding in the identification of trends and purchasing patterns. Clustering involves finding similar customers and placing them in common sets. It is a method based on machine learning techniques, which is why it is often difficult to clearly understand the criteria for selecting customers into specific clusters. Assigning customers to clusters helps better understand which products are popular among designated groups. This way, one can better forecast future needs and adjust the assortment accordingly.

What are the most common techniques for utilizing clustering?

Knowing the price sensitivity of customers from a specific cluster allows us to offer them more expensive versions of the products they are currently purchasing (known as upselling). Knowing the preferences of customers from a particular cluster allows us to recommend related products (known as cross-selling). Both techniques significantly increase the volume of sales per customer.

Clustering helps understand which products are most frequently purchased by different customer groups, facilitating better inventory management by reducing costs associated with storing unpopular goods.

We Can Earn a Lot by Assigning Customers to Clusters on E-commerce Platforms

Thanks to effective grouping of customers into clusters, it is possible to increase Customer Lifetime Value (CLV). Thus, a range of methods can be employed to encourage customers to remain in the store and isolate them from factors that accelerate the decision to leave the store.

Carpet Bombing or Precision Ammunition?

Perhaps the worst thing a company in the e-commerce sector can do is bombard customers with an overwhelming amount of information. One of the companies for which I developed a recommendation system had a habit of bombarding customers with additional proposals, discounts, and all sorts of offers after a successful transaction. Marketers operated under the assumption that the more flyers and ads they sent, the greater the likelihood something would yield results. The effect was easy to predict.

Precisely targeted ads are more effective and cost less than mass campaigns that may not reach the right audience. Mass sending of more or less random offers leads to customer irritation and discouragement. On the other hand, customers who receive personalized recommendations and offers are more likely to make a purchase, which increases the conversion rate.

Clustering customers on e-commerce platforms is a strategic tool that helps better understand customers, personalize offers, optimize marketing campaigns, and manage the assortment, ultimately resulting in higher revenues and profits. An equally important benefit is the increase in customer retention in the store.

How to Start Grouping Customers into Clusters?

As I mentioned earlier, to effectively know your customers, you must assign them to clusters. Most people believe that assigning specific individuals to certain sets is grouping. Most will also say that grouping involves identifying certain characteristics of individual people and, based on one or more traits, assigning those individuals to specific sets, segments, or groups.

Yes, one can accept such a definition of grouping. So we can group our customers into women and men, select age ranges: “very young,” “young,” “middle-aged,” “older.” We could then perform a transposition and create groups of customers such as “young men,” “older men,” “middle-aged women.” In this case, we have a customer population that has two traits: age categories and gender. We can add another category, such as “profession,” and for example, “frequency of visits to the store.” We would then see new groups, such as “middle-aged men, plumbers, visiting the store three times a month.”

But what do we do if we have not four traits but 40 or even 140 traits? Unfortunately, in the age of computerization, customers are described automatically with dozens of different traits. These are fixed traits, such as place of residence, type of activity, gender, as well as variable traits resulting from customer behavior on the online store: “decision class” (how quickly the customer makes decisions on the website), shopping efficiency class, depending on whether customers always buy or sometimes just browse the site without purchasing. There are many various customer traits resulting from their behavior. The most common analyses focus on their average spending level in the store, their frequency, stability in visiting the store, and the intervals between individual visits. All these behaviors are subject to classification. This means, for example, that a customer in decision class 1 almost always buys, while a customer in decision class 10 almost never buys. One can analyze an infinite number of various behaviors. A precise analysis of just customer behavior can easily lead to the collection of several dozen traits.

Customers are clustered to identify individuals with common characteristics. Now let’s imagine how filters can be used to group customers based on hundreds of collected traits.

Grouping is Not Clustering

Clustering and grouping are terms often used interchangeably. However, they are very different methods of preliminary data analysis. Here’s how they differ.

Clustering is a machine learning technique that involves grouping similar objects into sets (clusters). The goal of clustering is to find structures in the data. It is an unsupervised method, meaning it does not rely on validation and model improvement techniques.

Grouping is the process of organizing elements into groups based on certain criteria, such as gender, age, or interests. Grouping is a broad concept and can refer to various techniques, including clustering. Generally, grouping is usually done using simple filters. In contrast, clustering is used for exploratory analyses, where the goal is to discover natural groups or structures in complex data. Typically, we do not understand what the individual identified groups mean. It is said that similar or even identical objects are being sought. A model of artificial intelligence tries to find structures in the data based on very vague and complicated similarities. Understanding this similarity may be too complex for humans. That is why validation of clustering is applied.

In summary, clustering is most often used, for example, in customer segmentation, grouping genes with similar functions, or identifying subgroups within communities in social networks. In contrast, grouping is used, for instance, when organizing email messages into categories: spam vs. non-spam, simple groups like women – men, or market segments such as traditional channel – modern channel.

In conclusion, clustering is a specific technique of grouping that pertains to finding structures in data without supervision. Grouping is a broader concept that can include various techniques and methods, both supervised and unsupervised. Undoubtedly, simple grouping is insufficiently effective in the context of customer analysis. Grouping based on many criteria is extremely difficult and impractical.

PCA Method
To create a few traits from a cloud of dozens describing customers, which will allow for effective assignment to clusters, the PCA (Principal Component Analysis) method should be used.

PCA is a statistical technique used for dimensionality reduction and simplifying the analysis of large sets of variables. A dataset often contains many variables, which can be challenging to analyze. The goal is to transform information into principal components, where most of the information is contained. Initially, the data is standardized, meaning that each variable has a mean of 0 and a standard deviation of 1. This is important because the PCA method is sensitive to the scale of variables. Next, a covariance matrix is created, which shows how each pair of variables is related. Covariance measures how individual variables interact regarding direction and strength. Of course, from the perspective of the mill owner, the operation of the algorithm does not matter. What is essential is to know that, for example, from 150 customer traits, the PCA method has extracted 4-5 principal components. In the first component are the most effective, most distinct customer traits. In the subsequent components, less important information about customer specifics is found.

The principal components are independent of one another and contain most of the information from the original data. They can be used for further analysis, such as data visualization or modeling.

The PCA method helps identify hidden patterns and relationships between variables. It reduces noise and redundancy in the data, which can improve the performance of analytical models. Thus, PCA is a powerful tool in data analysis that helps simplify complex datasets while retaining as much relevant information as possible. It allows individuals to avoid making arbitrary decisions about which variables to eliminate from a vast cloud of information to facilitate further material analysis.

Clustering County Data
A year ago, I published an article on the platform „Medium” titled „Segmentation of a Population Containing Very Many Features. PCA Analysis and Clustering by k-means Method,” describing how to cluster objects with a large number of features.

The analysis focused on the „Communities and Crime” database containing information about 1,994 counties in the United States. The database was created in the context of analyzing criminal event occurrences. The objective was to identify the characteristics influencing crime levels in these counties. Each of the 1,994 counties in the database had 124 traits. This situation is very similar to having a large number of customers on an e-commerce platform described by a vast number of traits.

The aim of the task was to find communities that are similar in terms of characteristics. The goal was to group similar counties and assign them to specific clusters. This enabled the application of special methods for treating these counties using methods tailored to specific clusters. It also allowed for the use of effective benchmarking tools by comparing municipalities within clusters and finding niches and anomalies in certain areas. This method is very similar to clustering customer populations.

The detailed process in Python is described in the aforementioned article. The municipalities were assigned to seven clusters. To verify whether the clusters were indeed different from one another, it was necessary to compare the clusters based on their traits.

Example of Clustering Quality Assessment

For example, we could create 4 groups of workers: “white plumbers,” “black plumbers,” “white taxi drivers,” “black taxi drivers.” We would then place these individuals in a table, where one column would indicate the profession and rows would indicate skin color, thereby creating very clear group divisions.

The same applies to assessing the quality of clustering. A scatter plot is created, and the clusters are analyzed concerning two traits.

In our case, we had over 100 traits describing individual communities. Therefore, we used the previously described PCA method to consolidate these traits into a few, in our example, 7 principal traits. Thus, when comparing principal component 1 to principal component 2, we see distinct areas of colors. Each dot represents one county. Colors indicate clusters.

The above plot shows that clusters 1, 2, and 3 significantly differ in position regarding the two principal components. The problem arises with cluster 7, which ambiguously assigned counties. Communities in cluster 7 intermingle with clusters 6 and 4. It is noteworthy that the visually least number of counties are assigned to cluster 7. The most counties are assigned to clusters 1, 6, and 2. In PCA methodology, the first two components hold the most significant cognitive importance, containing the most information.

Subsequent plots present the comparison of the first PCA component with the next, less significant components. It appears that a high quality of county assignment to specific clusters has been maintained. Communities in the clusters evidently differ from each other.

Summary

The worst thing a company operating in the e-commerce industry can do is bombard customers with random promotions, flyers, and information. We live in an age of information overload, and most of it is treated as waste. A customer who has to deal with removing unnecessary offers or ignoring irrelevant suggestions will be unnecessarily burdened, leading them to leave our store in search of a place to shop with less effort.

Conversely, a customer who receives accurate suggestions and offers will conserve their energy in searching for products, making them more willing to shop at our store.

One can therefore assume that an effective customer relationship through relevant suggestions and appropriate offers is key to retaining them in our store.

To create relevant offers, we must know the customer. However, it is inefficient to build an individual offer for each of them. Of course, there are such techniques, and everything depends on the scope of the offer being created. However, in most cases, offers are built for entire groups of customers. Customers in clusters share a high level of similarity. Consequently, a whole strategy can be constructed for building relationships with individuals in specific clusters. The foundation for creating such algorithms is the accurate assignment of customers to clusters. Since customers are described in databases using dozens of different traits and metrics, it is impossible to create groups of customers through filtering. Machine learning methodology comes to the rescue, effectively finding similar individuals.

Wojciech Moszczyński

Wojciech Moszczyński is a graduate of the Department of Econometrics and Statistics at Nicolaus Copernicus University in Toruń, specializing in econometrics, finance, data science, and management accounting. He specializes in optimizing production and logistics processes. He conducts research in the field of the development and application of artificial intelligence. He has been involved in popularizing machine learning and data science in business environments for years.

Artykuł We can earn a lot by assigning customers to clusters on e-commerce platforms pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> https://sigmaquality.pl/my-publications/we-can-earn-a-lot-by-assigning-customers-to-clusters-on-e-commerce-platforms/feed/ 1 Artificial Intelligence in the Grain and Milling Industry? https://sigmaquality.pl/my-publications/artificial-intelligence-in-the-grain-and-milling-industry/ Sat, 02 Nov 2024 20:07:27 +0000 https://sigmaquality.pl/?p=8395 c Artificial Intelligence The dynamic development of artificial intelligence and its proliferation among entrepreneurs offers an extraordinary opportunity to streamline production and logistics processes. This [...]

Artykuł Artificial Intelligence in the Grain and Milling Industry? pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> W-Moszczynski

c

Artificial Intelligence

The dynamic development of artificial intelligence and its proliferation among entrepreneurs offers an extraordinary opportunity to streamline production and logistics processes. This technology can also serve as a basis for gaining a competitive edge. To explore opportunities for surpassing competitors in profitability and efficiency of production processes using artificial intelligence, one must first understand the main directions and possibilities that this technology offers. In this paper, we will discuss how to harness artificial intelligence for intensive work within just one hour.

YES, in an hour you will have your own cybernetic employee who will do exactly what is needed here and now.

Can I be a pioneer of change?
A monthly subscription using the most common model of artificial intelligence, Chat GPT 4, costs $20. Paid users of Chat GPT can utilize plugins and programs that have been created by other users. Moreover, users can create their own plugins or define assistants. There is no need for any programming skills. Programming assistants and plugins for Chat GPT 4 is done through writing instructions. The instructions take the form of sentences and can be written in Polish. In summary, everything is very easy and straightforward, requiring virtually no programming or analytical qualifications. So, let’s proceed to the possible applications of artificial intelligence in the grain milling industry.

Let’s start by defining the most important tools-cyborgs that we can specify ourselves and which can greatly assist us in our work. These tools will be able to quickly search for the most important information. They will also design content, analyze data, and compare different types of information.

Financial Assistant GPT
Only those who have purchased a subscription can create their own assistants in Chat GPT 4. We can design an assistant and attach to its memory all our notes, documents, and reports that we have written over the last 15 years. We can also attach contact lists—everything that will be necessary for its functioning.

Let’s assume we want to create an analytical assistant, a virtual person who will be an expert in analyzing reports. When programming such a virtual analyst, we can start with the following instruction: „I am the director of a grain elevator base, and I expect you to help me analyze the reports of grain intake and distribution from the elevators.”

Artificial Intelligence in the Grain and Milling Industry?

Next, we define what behavior we expect from our virtual analyst: „At first, ask me to send reports based on which you will conduct the analysis.” At this moment, when we activate the assistant, it will ask us for the current reports. We attach the reports in the window where we enter commands. The format of the reports can be anything; they can be documents in CSV, PDF, or Word format, spreadsheets, or even photos. In this case, we can send cash and inventory reports. Theoretically, the assistant can automatically retrieve such reports from the system, but that requires a bit of tinkering.

The next instruction should be: „Then ask me what format of reports you should prepare. Ask what you should analyze.” Here are two types of analyses. „Ask me to choose one of these analyses: 1. Daily report of intake and distribution of raw materials. Provide: how many tons of each type of grain were received, how much was paid for each type of grain, what is the account balance at the beginning and end of the day, and the inventory status at the beginning and end of the day. Also, provide the number of vehicles weighed on the scales. Place all information in a table and export it to MS Excel format. Respond in Polish. 2. Report on wholesale prices on the commodity exchange. Provide the prices of each type of grain at collection points on commodity exchanges and in sales, provide our prices, and compare them with the prices from the commodity exchange. Also provide the exchange rates for the euro and dollar. Place the data in a table. Save the table in PDF format.”

We save the instruction and close the assistant editor. What do we get? By going to the Chat GPT website, we can do this from both a computer and a phone. We click on the analytical assistant. A polite voice asks us to provide daily information on which it can perform the analysis. After receiving the daily reports, it asks us which analysis we would like to hear. In the instructions, we defined two analyses. The assistant informs us of everything that has been defined for it. At the end, it saves all the information it conveyed in table form. (Source: Chat GPT 4 https://chat.openai.com/g/g-Hoc1c7dIy-sticker-wizard). The above-described behavior is a defined behavior. However, we can propose the analytical assistant to perform a different analysis that has not been defined by us. We can tell it to analyze the number of weighings and the number of vehicles that entered the scale. We will then know how many vehicles were weighed multiple times. We can directly ask how many vehicles were weighed multiple times and who they belonged to. The number of analysis variants, the number of possible instructions and interactions is unlimited, or rather, limited only by the data we directly provide to the assistant. Regarding external information, we can provide it with links to commodity exchange portals in its memory so that it can retrieve data independently.

Strategic Assistant
The strategic assistant is another variant of the Chat GPT assistant. A strategy specialist is someone who, based on years of research, analysis, definitions, and all kinds of conditions that have been gathered in the past, creates certain probabilities, certain forecasts, based on which strategic decisions are made. What should we do to define this kind of virtual specialist? Above all, we should provide it with all our documents that are significant for us in creating strategies. We can upload many books in PDF format to its memory. We can provide it with financial data. In the case of this data, we should prefer documents in the form of management reports. Chat GPT still has a rather weak understanding of typical, extensive spreadsheets. Therefore, financial and quantitative data should take the form of documents rather than spreadsheets for the virtual assistant.

We program the strategic assistant similarly to the analytical one, but it is very important to include the following protocol in the instruction: „When creating strategies and answering any of my questions, you must use the resources that have been uploaded to your memory. Use all documents, PDFs, reports, and spreadsheets for the analyses that I request.” Of course, these sentences can be formulated differently. The important thing is to refer to the resources that have been uploaded to the memory of the strategic assistant. Without this information, the assistant will not use specialized knowledge that results from the experience of a particular enterprise or director. If we do not upload our own knowledge sources, the assistant will rely on general knowledge.

Human Resources Assistant
In a similar way, we can define assistants dealing with human resources. In this case, we should upload our own notes analyzing the labor code, our analyses, personnel records of individual employees, current reports related to working time, and so forth. The assistant can provide information on how many hours a given employee has worked, when they were last on sick leave, what their salary is, or how much they are entitled to as a bonus for that month. Besides simple information that the human resources assistant can provide, we can ask about individuals who always take sick leave before Easter or we can request information about which employees worked together in the same company in the past.

The assistant will analyze all the CVs of individual employees and generate a summary. A necessary condition is to upload all the needed information to the assistant’s memory and, as I mentioned, to indicate to the assistant that it should use the documents that have been uploaded to it.

In a similar way, one can create any assistant, a virtual employee who will work effectively and according to our expectations. Therefore, we can hire a technologist, a security specialist, a logistics expert, or a virtual warehouse assistant. Above all, we must convey our expectations and define what they must receive from us in order to conduct analyses, submit reports, clarify, or identify concerning phenomena. We can create an article editor assistant that will search for the most interesting information from publications available on the internet. We can define that this assistant must provide the five most important points from each encountered publication related to a given topic. The assistant can also search for information that may pose a threat to our company or industry. It is advisable to upload documents that will indicate to the assistant what it should understand as a threat. We can define a virtual psychologist who will use 40 psychological books uploaded to its memory in PDF files. We can also hire a virtual detective who will conduct investigations for us. We could even have our own doctor who could diagnose us based on the symptoms we provide.

With your own assistant, you can communicate using the computer keyboard. You can also use assistants with voice systems. While driving, you can activate your financial assistant and assign it analyses. If the assistant asks which analysis it should perform (at the beginning, we defined two types of analyses), we can directly say: „Do something different for me, tell me what the prices are for purchasing sugar beets on commodity exchanges.” The assistant will, of course, respond to all questions and send an accurate report of the conversation if we ask it to do so.

Assistant Visualization
Managers create virtual offices for themselves and fill them with robotic assistants. I once saw a visualization where a manager walks around the screen using a keyboard as an avatar and enters different rooms where assistants are seated. Visualizing assistants is a very nice solution as it brings these robots closer to human forms. To minimally visualize an assistant, we can upload our own image of the assistant’s face. A more advanced method is to use a plugin that designs the faces of individual assistants for us. There are many plugins; I use a plugin for Chat GPT 4 called Sticker Wizard.

This assistant is generally available in Chat GPT; it is one of dozens of plugins specializing in designing stickers.

Below, I present a few stickers that were created by this graphic assistant.

I would now like to introduce you to three of my assistants. The first is an accountant who answers difficult questions regarding expenses, types of costs, provides me with charts related to expenditures, and informs me about account balances, debts, and payment due dates.

This gentleman with a beard and mustache is the assistant who talks to me during long walks in the woods. While walking my dog, I often discuss current events with him. The assistant is designed to present me with the latest facts along with the economic phenomena currently taking place.

And this gentleman is the assistant who tells me about the history of the Industrial Revolution in Europe. He narrates facts related to the development or decline of entire industries or specific enterprises. During walks, the assistant informs me about the most important events related to industrial history and refers to many economic theories in his narratives.

Below, I present the prompt based on which the program generated the face of my industrial revolution historian. Notice that the command I issued differs slightly from the image generated by the model. Here is the prompt: „Make me a round sticker with the face of a man in a hat, about 50 years old, smiling, in Victorian style. Show this design now. Use colors: black, white, silver.”

Summary
Today, practically anyone for $20 a month can have whatever employee they want. Of course, I am referring to mental workers; we can create psychologists, assistants telling jokes, or specialists in observing wild birds. The latter will generate images and recognize birds based on the photos sent to him. Our virtual office can be filled with all kinds of experts. They can be virtual people who will, for example, proofread texts. Until recently, I considered my two-page CV to be a perfectly crafted masterpiece. The assistant specializing in creating CVs detected many stylistic errors, pointed out that using three types of fonts is unprofessional, stated that there shouldn’t be so many colors, noted that the photo was inappropriate, and explained that the style of responsibilities for individual job positions does not align, suggesting that some of these responsibilities were copied from external sources. A quick analysis of my CV indicated that the hastily configured assistant can perform its work multiple times better than the average person and even better than a specialist in a specific field of knowledge. Just look at the graphics this machine can generate.

Recently, someone asked me what they should do to implement artificial intelligence in their company. The above publication indicates how to do this within an hour. The information revolution, like any revolution, can be astonishing.

Wojciech Moszczyński

Wojciech Moszczyński is a graduate of the Department of Econometrics and Statistics at Nicolaus Copernicus University in Toruń, specializing in econometrics, finance, data science, and management accounting. He specializes in optimizing production and logistics processes. He has been conducting research in the field of the development and application of artificial intelligence. He has been engaged in popularizing machine learning and data science in business environments.

Artykuł Artificial Intelligence in the Grain and Milling Industry? pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> Recommendation systems for online stores https://sigmaquality.pl/my-publications/recommendation-systems-for-online-stores/ Sat, 02 Nov 2024 19:57:02 +0000 https://sigmaquality.pl/?p=8389 Artificial Intelligence Recommendation systems are used to optimize economic activity in the areas of sales and costs. In online sales, there are two entities subjected [...]

Artykuł Recommendation systems for online stores pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>

MOSZCZYNSKI 8-24

Artificial Intelligence

Recommendation systems are used to optimize economic activity in the areas of sales and costs. In online sales, there are two entities subjected to recommendation systems: customers and products.

When discussing the creation of recommendations, we can also talk about creating tools that assist in strategic decision-making. This involves analyzing sales channels, directions of development, and eliminating areas that burden the business. However, these types of strategic recommendations do not fit the topic I would like to discuss today.

Thus, the main focus of the basic recommendation system is the customer and the product.

What is the goal of recommendation systems for customers on e-commerce platforms?

Essentially, there are two goals that should be discussed separately.

First – it is about encouraging the customer to purchase more. Recommendation systems can significantly increase the volume of purchases made by individual customers in a simple and inexpensive manner.

Second – it aims to increase customer retention rates. The goal here is to prevent customers from leaving the stores. It is obvious that it is much easier to lose a customer than to gain a new one. This belief leads to a concern to do everything possible to retain customers.

Both of these goals: increasing purchasing power and retaining customers are achieved through different, very distinct tools. I mentioned that these goals should be considered separately. If we focus on customer retention, we should not aim to increase their purchasing effectiveness.

On one hand, this is true because such an approach aligns with the iron law of optimization, which states that one can only optimize one thing at a time, not many things simultaneously. On the other hand, it contradicts intuition and experience. If we have caused, through recommendation systems, that a customer started purchasing more than before, it is likely related to the fact that the customer was very satisfied with their collaboration with the store. A satisfied customer will stay in the store longer. Of course, there can always be a counterargument – customers can be drawn in by promotions and things they do not need or want. Customers, whether consciously or unconsciously manipulated, usually lose trust and leave. This is a good argument indicating that the recommendation system must operate on fair principles. Only then will it positively influence customer retention alongside revenue optimization.

Recommendation Systems Must Predict the Future

The recommendation system is based on historical events. Its operation relies on a list of transactions made by customers in the past. However, the true goal of a recommendation system is to predict what a customer will do. Predicting the future is the foundation of optimization. This optimization pertains to both the current purchasing process and the future supply of products in the store.

By serving an individual e-commerce customer, the recommendation system can predict which products to suggest to increase the chances of additional sales. At the same time, the recommendation system can help forecast the demand for specific products in the future.

Effective forecasting of the future is therefore the foundation of an efficient recommendation system.

What is the goal of creating recommendation systems for products?

Here we must point out a certain inconsistency. A recommendation system, as the name implies, serves to recommend or suggest something. Can we suggest anything to products? However, it is possible to optimize the structure of these products; we can observe trends and processes using statistical tools. This way, it is possible to optimize the product structure in the store, sending appropriate assortments to places where they are very much needed and removing them from places where they are not popular. With a recommendation system for products, it is possible to optimize the sales process and achieve high profits or reduce costs.

The primary entity to which the recommendation system is directed is the customer, as the customer makes decisions and is a somewhat unstable element that must be observed from a predictive perspective.

The recommendation system analyzes the collision of customer behaviors with a kind of „behavior” of products. Each customer has a unique behavior pattern that rarely changes. Products have certain periods of popularity, and they also have some connections with each other. Customers themselves validate products and the relationships between them. The role of data science is to catch these links between customers and products.

The most important tool in recommendation systems is clustering entities into clusters. This applies to both customers and products. In clusters, similar objects are grouped based on certain features. It is crucial to indicate those features that are important for the sales process. The above description may seem somewhat convoluted and unclear, so below I will try to clarify, step by step, how to create recommendation systems.

Internal Data Sources for E-commerce Recommendation Systems

The primary data source in e-commerce sales systems is the sales register, which includes customer logins. The purchase history of individual customers is tracked and modeled by data analysts.

Similar registers exist in retail sales, where customers use loyalty cards. According to research I conducted a year ago, about 70

Another source of data for e-commerce recommendation systems is static customer data. This data is collected when a customer account is created and is not present in the sales register.

Anonymous Customers

Some customers making purchases do not register with online stores. Sometimes customers do not disclose themselves in the system for various reasons.

Thus, there are transactions without customer ID information. However, these transactions contain a range of information that allows systems to easily match customer data with the customer ID number. This includes the credit card number, phone number, and address to which the customer wishes to receive the purchased product. We can also mention the unique address of the device from which purchases are made. A customer identified by metadata, who did not wish to reveal themselves, can be assigned to customer transactions using simple applications operating within the e-commerce system.

It should be noted that customers who see that they can be identified by intelligent recommendation systems, even though they did not disclose their data, may lose trust in the store, which could lead them to leave.

Multiple Accounts Customers

Multiple account situations occur when one customer account serves multiple people. For instance, a production company has one ID code in the online store, and purchases are made for all departments of the company. Hydraulic pipes, valves, mattresses, power tools, and paints are purchased. Worse still, purchases are often made by different people with varying behavior patterns. Private purchases can also be made from a corporate account.

In general, multiple accounts disrupt the efficiency of e-commerce systems. They should be eliminated, just like outlier values in modeling. Unfortunately, multiple accounts are often high-turnover accounts. One way to deal with such accounts is to encourage the company to create multiple accounts in exchange for discounts and promotions for specialization.

External Data Sources

Alongside the information stored in the online store’s databases, external data is also important. Unfortunately, customers do not make decisions solely based on what they see in the store. A large part of their decisions is based on external information. If someone previously saw dishwashing liquid for 5 PLN in another online store, they will not be tempted to buy dishwashing liquid for 8 PLN. A recommendation advertisement that appears at the bottom of the purchasing window with dishwashing liquid for 10 PLN is likely to discourage the customer rather than encourage further purchases.

The same customer will be tempted to buy dishwashing liquid for 8 PLN if they receive something in return. To propose an attractive offer, the system must be perfectly situationally aware. It must know the customer, but it also must know the market and the competition’s offer.

Information about prices and availability of competitive products should therefore be included as a variable in the recommendation models. Predictive models should also contain a multitude of regular external information, such as the day of the week, month, season, and many important situational variables such as weather forecasts and consumer optimism indices. The system should know the trends and fashions in the market. All this information is processed as variables in the realm of digital mathematical models.

Clustering the Customer Population

Technically, it is relatively possible to predict the behaviors of each customer accurately. Buyers behave in a repetitive manner. However, no one tries to treat customers individually, as this solution would be inefficient. The goal of the clustering process, which is a more effective form of simple grouping, is to find customers who have very similar behaviors. A customer has individual habits and routines, and their behavior is similar to that of others. The system attempts to assign individuals to a cluster of people very similar in specific features. They become a kind of digital twins, acting very similarly, sharing the same phobias, preferences, purchasing frequencies, and sensitivity to advertisements. A population consisting of 100,000 customers can thus be divided into 5,000 clusters. People assigned to specific clusters will be treated differently by the sales system. For each of these groups, a different form of incentive and persuasion will be applied. Excellent results come from finding individuals within a cluster who deviate from the others and who, thanks to knowledge of the traits of other individuals in the cluster, can be easily persuaded to make larger or more efficient purchases.

Transposition of Divisions

Assigning customers to clusters can occur according to various variables. Initially, an RFM approach (Recency, Frequency, and Monetary) can be applied, where customers are grouped based on the frequency and volume of their purchases. This way, a preliminary analysis of customers is conducted. Then, these customers can be grouped based on the assortment they choose. This way, the first set of customer clusters can be formed. Now transposition can be done, using RFM grouping to overlay assortment clusters and analyze the customer distributions in these transpositions. This way, transposition matrices are created, indicating market niches, anomalies, and areas where the sales process is unnaturally excessive.

Customers can be assigned to various sets of clusters, and within each of these sets, correlations or non-linear relationships can be analyzed with other simple divisions such as seasonality, multiplicity, place of residence, gender, or day of the week. We will understand that the ability to analyze customer behaviors is practically endless.

Similarly, products can also be grouped; they can be clustered based on sales frequency, turnover, or seasonality. Alongside simple groupings, complex partitioning of product populations can be created. Product and customer clusters can be matched, creating infinite patterns of behavior. The most important thing is to find the repeatability of customer behavior patterns in relation to related products. The ability to eliminate excessive information is crucial here.

Simple E-commerce System in Practice

Let’s assume a customer appears on the e-commerce platform. This customer is one of 140 customers grouped in one cluster. In this cluster, customers behave similarly. This customer has placed a hammer in their cart. The system immediately finds a second item related to it. This is a screwdriver. It turns out that in this cluster, customers who chose a screwdriver usually also bought a hammer, and if they chose a hammer, they also bought a screwdriver. Among the 140 customers, 38 made this choice in the past. The system estimates that the customer who selected a hammer on the e-commerce platform has a 24

Recommendation Pairs

Products sold in pairs or trios can be searched without relying on customer clusters. If someone buys a dish sponge, they are likely also going to buy dishwashing liquid. For some reason, however, some customers do this, while others buy only sponges. This may be due to the fact that the sponges are not intended for washing dishes but, for example, for cleaning car rims.

Or perhaps sponges and dishwashing liquid are bought more often by women, while sponges without dishwashing liquid are bought more often by men. This is a typical behavior pattern where the customer’s belonging to a specific group, cluster, plays an important role. This simple example indicates that it is important for recommendation systems to use pairs based on clusters. Such individual patterns are found thousands of times. Fortunately, tools search for and select them en masse.

Analysis of Incomplete Purchases in a Cluster

As mentioned earlier, customers assigned to the same cluster exhibit a specific set of behaviors. For example, let’s take a cluster of people who purchase roofing-related articles. The algorithms consider the frequency and value of purchased items and the type of customer (for example, small business). A substantial group of customers is classified as small contractor roofers. Now it is possible to analyze the completeness of their purchases. Some customers buy within certain categories offered by the store, such as insulation materials, gutters, tiles, bituminous coverings, and adhesives.

Let’s assume that the completeness analysis showed that 27

There is nothing left but to offer this excellent adhesive deal exclusively to the 27

Analysis of Interrupted Habits

Grouping customers into purchasing clusters facilitates the detection of changes in their habits. Let’s assume we have a cluster of plumbers. Seasonally, small hydrofor pumps were purchased in this cluster. The seasonality resulted from the fact that, in spring, it often turned out that small hydrofor pumps installed in recreational gardens needed replacement. However, at some point, it was found that hydrofor pumps were not being sold in the spring within the plumbers’ cluster.

If we had analyzed the level of hydrofor pump sales without clusters, we might not have detected this anomaly. Hydrofor pumps can be purchased by various customer groups. However, we are most interested in plumbers. Thanks to the detected anomaly, it is possible to investigate the situation and propose better conditions so that plumbers do not buy hydrofor pumps from competitors. This action will not be automatically executed by the recommendation system, but the system allows for such actions by sales managers.

How to Retain Customers in the Store?

Until now, my considerations have focused on increasing the purchasing efficiency of customers. One customer purchasing a hammer may, thanks to a well-displayed advertisement, also buy a screwdriver. A woman buying dishwashing liquid receives recommendations for sponge purchases.

Thanks to these and thousands of similarly effectively utilized patterns, the overall average purchasing efficiency of customers can increase by 30-40

They will also feel an improvement in the shopping experience because a well-fitted suggestion is usually well-received by customers.

However, an increase in the purchasing efficiency of customers may not compensate for a decline in sales due to the departure of some customers. Customers usually leave for some reason, often due to a poorly fitted treatment format or offers from other stores. The reason may also be unrelated to the store’s operations. To avoid losses resulting from our store’s operations, another recommendation system must be created, focused solely on customer retention.

Here too, we must carry out clustering and divide customers into groups that consist of digital twins. Then it is necessary to initiate survival tests.

The survival test is a method developed long ago for medical facilities, aimed at statistically analyzing the survival time of patients with a certain probability concerning the time based on currently applied medical treatments. Various treatments were tested this way, and based on that, tools calculated the probability of future patient deaths. Thus, applying treatment for specific patient groups predicted, for example, that they had an 80

With this method, it is easy to determine the likelihood of retaining a customer for the upcoming years. Various incentives can be added to the algorithm, such as loyalty cards, discounts, periodic individual discounts, and information on the effect of these techniques on customer retention. This relatively simple approach can drastically reduce the likelihood of customer departure. Interestingly, such a system can also serve for the ongoing management of customers. It can be a handy advisor for sales representatives talking to particular customers.

Analysis of Purchase Cancellations

An important element preceding the decision for the customer’s approach from the e-commerce platform is the purchase cancellation rate. A customer may decide to leave the store because they have not found a suitable offer for an extended period. And such a decision will also not be detected by the survival test.

To create a purchase cancellation rate, it is necessary to reconfigure the data collection system. As mentioned earlier, the recommendation system is built based on the sales register. The sales register consists of transactions that were concluded with the customer within a certain period and for certain products. If a customer enters the online store and browses different pages, this behavior will not be found in the sales register.

The only way to analyze customer behavior is through the history of browsing the online store’s pages. This requires the application of specialized tracking tools, but this endeavor is cost-effective because it effectively indicates customer fluctuations and their purchasing determination.

How to Build a Recommendation System?

The most crucial aspect in building a recommendation system is to identify the goal. Fortunately, usually, all companies want the same thing, which is an increase in customer purchasing efficiency. At the same time, e-commerce stores most often want to identify the factors contributing to customer departures. Both of these goals can easily be reconciled based on simple principles for building a recommendation system.

The most important elements are data, which must be complete, reliable, and comprehensive. It is impossible to create an effective recommendation system based on a small amount of data. A sales transaction record containing at least several hundred thousand operations should be linked to other databases obtained from external sources.

The next step is the proper use of data clustering methods. Often, with many variables, the PCA algorithm (Principal Component Analysis) is used. This is a method of combining many features into 3 or 4 consolidated component features. Partial features are combined into main features based on the strength of mutual covariances among features.

Let’s imagine information about a customer and try to count how many features they can have. This may include residential address, number of transactions per month, number of different product categories, payment method, and dozens of other variables. With such data clouds, it is impossible to effectively assign a customer to a cluster. That is why various tools are used to reduce the number of variables through reduction or grouping into large variables using the PCA algorithm.

Survival analysis, like tools that increase customer purchasing efficiency, is a method simple to create and manage. At the same time, the methodology is considered tedious because there is a vast amount of interactions and external changes that require experience from the researcher. A very important skill is maintaining the correct research priorities, involving the elimination of inefficient patterns and expanding effective ones. This requires high self-discipline and patience.

First Prototype, Then Implementation

In creating a recommendation system, a prototype is first built, usually written in Python and based on the language’s libraries. The prototype is tested and refined in a programming environment. Unfortunately, the prototype is not suitable for use in the sales process. This solution must be implemented in a production environment. This is the responsibility of the data engineer, who implements the program in a cloud environment or in a local environment that supports sales systems.

Batch and Streaming Systems

Recommendation systems can be based on historical records. The prototype is built on a certain history that changes over time. Therefore, the system must be updated. Someone who previously bought yogurts suddenly began buying kefirs. Someone who once bought a hammer and screwdriver is unlikely to want to repeat their decisions. However, we know that a customer who once bought a hammer and screwdriver will, after some time, want to buy a drill. These examples illustrate the necessity of periodically updating the recommendation system. The more frequently this update occurs, the better. Ideally, it is when the update is conducted continuously. Therefore, it is possible to build a recommendation system based on a constant flow of information in the form of data streams and process them into the recommendation system.

Summary

Everyone knows that a well-formulated, tailor-made offer increases customer purchasing efficiency and additionally enhances customer loyalty to the store. The problem is that to tailor an offer for a customer, we must know a lot about them and use tools that will ensure this knowledge is utilized effectively.

To achieve such an endeavor, it is necessary to organize data and the processes for their processing and utilization in analysis perfectly.

The process is difficult but necessary because if we do not do it, the competition will. A recommendation system on an e-commerce platform can be an excellent source of competitive advantage, allowing for greater sales efficiency without incurring substantial expenses on general advertising or excessive, broad promotions, thus securing higher profits that ensure the sustainability and increasing value of the e-commerce business.

Wojciech Moszczyński

Wojciech Moszczyński is a graduate of the Department of Econometrics and Statistics at Nicolaus Copernicus University in Toruń, specializing in econometrics, finance, data science, and management accounting. He specializes in optimizing production and logistics processes. He has been involved in researching the development and application of artificial intelligence for years. He has been engaged in popularizing machine learning and data science in business environments.

Artykuł Recommendation systems for online stores pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> How to gather customer needs in a machine learning project? https://sigmaquality.pl/my-publications/how-to-gather-customer-needs-in-a-machine-learning-project/ Sat, 02 Nov 2024 08:27:29 +0000 https://sigmaquality.pl/?p=8383 For anyone starting a machine learning project, it’s clear that the most important step is to properly define business goals. We can compare goals to [...]

Artykuł How to gather customer needs in a machine learning project? pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> W-Moszczynski pop

For anyone starting a machine learning project, it’s clear that the most important step is to properly define business goals.

We can compare goals to the foundation of a building. What mistakes can be made when laying a foundation? The foundation might not align with the building’s layout. Imagine a situation where part of the building doesn’t rest on the foundation. Additionally, the density and reinforcement of the foundation material could be insufficient, risking future collapse. One thing is certain: once the foundation is laid and the house built upon it, changing that foundation without completely dismantling the house is impossible.

The same applies to projects. If we adopt incorrect goals and requirements and begin the project based on them, we won’t be able to change them mid-project. Faulty assumptions can lead to complete project failure. Only the client can provide the goals and needs, yet they may not always be competent or ready to define them.

The Client May Not Understand Their Business

It’s not ideal to assume that clients don’t understand their own processes. We remember the old adage: „the customer is always right.” However, the typical client might not always know where the problem lies. We often expect them to have a high-level understanding of everything happening in the company, along with a deep understanding of processes and all aspects of their business.

But the reality can be quite different. A particular client might be excellent at the core part of their business and only that part. For example, the owner may be „the best in the world” at winning construction bids, which is the primary source of the company’s success. Thanks to this market advantage, the owner can hire specialists to handle all essential supporting processes. This situation may lead to a complete lack of knowledge about other business areas.

It’s likely that the client contacted us not about the part of their business they excel in, but the part they don’t understand. They might be struggling in this area and feel helpless. So, we may be wrong in our expectations regarding the current awareness of the clients seeking our help. Naturally, clients feel obligated to show us their present problem. And in this case, the client is the one who should present the issue. But is this person the right one for that?

When a patient visits a doctor, they must share their symptoms and health issues, but no one expects them to make a medical diagnosis. When a customer brings a car to a mechanic, they describe the problem, but they don’t diagnose the cause. Unfortunately, in the business world, owners often point to the causes of their problems, leaving analysts hesitant to question their views.

Sometimes I feel data analysts or other experts in business problem-solving should also be psychologists. At the start of a project, we need to gather all the symptoms and allow the owner to share their hypothesis about the cause. This hypothesis can be valuable to us. However, like a doctor or mechanic, we must gather all available data and use our professional tools to identify the true problem. It’s great if our findings align with the owner’s hypotheses.

Sometimes the Owner is Subconsciously Embarrassed by Their Problem

In my work, I try to avoid delving into business psychology whenever possible. Unfortunately, this is rarely achievable, as this realm permeates everything in business. Businesses are run by people with their dilemmas and issues, often too proud or plagued by certain fears. Business is created by people; office furniture, machines, and vehicles don’t generate income on their own. Embarrassment and anxiety among business owners are more common than one might think, often tied to low self-esteem, shame from personal insecurities or incompetence, or even a subconscious need to impress.

As I mentioned, psychology isn’t my strong suit, but avoiding this aspect is a recipe for failure in gathering project requirements. The risk isn’t just about wasting time collecting irrelevant requirements or setting false goals. It’s also about maintaining good relations with the owner, which greatly impacts future collaboration.

Consider the story of a man who runs a business because he’s excellent at winning construction bids. Some residents and employees see his success and regard him as a business genius, overlooking his flaws. Despite his evident issues, his erratic behavior, his long working hours, and his drivers cheating on fuel, they still hold him in high esteem. This owner doesn’t know how to solve certain problems and is embarrassed by them. So, when external analysts arrive, he discusses a problem he wishes he had, not the things he finds shameful.

It May Sound Complex, But It’s Very Simple

The owner realizes that the problem may be simple for others, but it’s not for him. So, he creates a sort of mystery around it. It’s like visiting a psychiatrist due to an irrational fear of the neighbor’s cat but not knowing how to explain it without sounding ridiculous. This analogy doesn’t fully capture the complexity of the problem, but it gives some insight.

The Difference in Worldviews

This essay doesn’t aim to highlight the limitations or ignorance of business owners. Instead, it provides guidelines for building a solid foundation for a project. External analysts aim to address all potential problems. Their profession is to identify possible causes and solutions to issues that trouble companies. For this, developing effective situational awareness is essential.

Proper situational insight is crucial to understanding processes. Imagine someone struggling with logistics costs. Their business model shows that logistics expenses consume a significant portion of the production profit, resulting in company losses. Unfortunately, humans have inherent limitations shaped by evolution. Humans can’t consider more than six or seven factors simultaneously. In contrast, machine learning algorithms can handle large data dimensions, often comprising thousands of features and enriched by inter-temporal shifts. Humans can’t interpret thousands of interconnected business factor correlations. Naturally, human cognition is limited, but those who use AI’s potential start to think outside the box, forming visions that may confuse or even annoy others.

Sadly, the lack of abstract and unconventional thinking is common among managers focused solely on their own business. When I worked in large corporations, consultants or people with overly abstract thinking were often sidelined or ignored. In our previous example, transport expenses consumed a large portion of production profit. The typical management response is to reduce overall costs and impose strict fuel monitoring, but the impact is limited. Fuel consumption can’t fall below technical norms; truck insurance or driver layoffs are unavoidable.

Significant change can come from outsiders who aren’t entangled in daily operations. They may suggest optimal truck load capacity for this specific logistics process, drawing insights from operational research algorithms. They might even reveal that the current transport model isn’t economically justified, and alternative transportation solutions could be identified, new transport networks established, warehouse locations optimized, and optimal transport times and load sizes determined. It may turn out that transport in this company isn’t necessary at all. Perhaps focusing on profitable core activities and leaving logistics to clients would be more effective.

What is Optimization?

We are accustomed to using the term „optimization,” but I’m not sure if everyone fully understands it.

A process can be described as a configuration of two value streams: a set of input resources entering the process and the outputs at the end. Before explaining optimization, I want to clarify efficiency. Efficiency is the ratio between input and output values in a process.

Efficiency should align with intended goals. Suppose our goal is to reduce transport costs. To achieve this, we decide to buy new trucks. However, these new trucks need to be paid off. In the end, we have the same revenue from transport, but our costs have increased. The difference between old and new trucks isn’t significant, and the debt service cost has reduced logistics profitability. Despite the investments, the goal of increased efficiency and improved financial results wasn’t achieved. If the management had set a goal of environmental protection or accident risk reduction, acquiring new trucks might have been a good decision despite the higher maintenance costs.

So, we conclude that the initiative’s goal is crucial. Goal information is the most important insight we must obtain from the business owner during project initiation.

Summary

When building a house, creating a proper foundation is essential. To optimize and find the best business solution, it’s vital to understand the source of the problem and identify the true goal of the initiative. Failing at this stage will likely result in project failure.

Wojciech Moszczyński
Graduate of the Quantitative Methods Department at Nicolaus Copernicus University in Toruń, specializing in econometrics, data science, and management accounting. He focuses on optimizing production and logistics processes and conducts research in AI development and application. He has been dedicated for years to promoting machine learning and data science in business environments.

Artykuł How to gather customer needs in a machine learning project? pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> Regularization in machine learning models https://sigmaquality.pl/my-publications/regularization-in-machine-learning-models/ Sat, 02 Nov 2024 08:22:04 +0000 https://sigmaquality.pl/?p=8376 Machine learning, in simple terms, involves creating mathematical simulations of existing, real-world processes. These simulations are commonly referred to as models, built mostly based on [...]

Artykuł Regularization in machine learning models pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> W-Moszczynski (1)

Machine learning, in simple terms, involves creating mathematical simulations of existing, real-world processes. These simulations are commonly referred to as models, built mostly based on historical data.

What is Machine Learning?
The term „machine learning” is a simplified way of saying that we are teaching a machine something. Here, the „machine” is just an algorithm. For example, in a bakery, we may install a device to count the number of customers entering. This device will generate a time series, showing, for instance, 213 customers on Monday and 319 on Tuesday. We can compare this data with other Mondays and Tuesdays in the past to statistically forecast how many customers will come on a future Monday or Tuesday. Adding factors such as season or weather conditions may improve our model’s predictive accuracy. This gives us a model, one based on averages from previous months. For instance, Mondays have had an average of 305 customers with a standard deviation of 21 customers.

Models based on averages from time series belong to the family of autoregressive models. These models can be created in a spreadsheet, and adding factors like season or weather increases predictive power. When conditional averages emerge, Bayesian models are often applied. Conducting conditional probability analyses in a spreadsheet can be tedious, and when predictive factors become more conditional, it complicates the model further and may improve its accuracy, warranting a shift from spreadsheets to a richer econometric model library.

Machine learning is essentially about building a model that mirrors reality, designed to reflect current and future events in processes. With historical records of a phenomenon in terms of its parameter changes, we can teach a machine statistical inference, including predicting the future.

Let’s assume we want to build a weather model to predict rain in the coming hours. Imagine a technological process that is disrupted by rain, requiring the bakery owner to take protective measures, like covering bread baskets left outside early in the morning. The owner finds standard weather forecasts insufficient.

When building the model, we use historical data such as atmospheric pressure, cloud cover, season, temperature, and wind speed. The goal is for drivers to have an app on their smartphones that says, „Cover the bread baskets with plastic” or „Leave the baskets uncovered.”

We observe that when atmospheric pressure dropped, there was significant cloud cover, and gusty winds, the probability of rain increased. All this data is incorporated into a model designed to predict whether rain will occur or not. This type of problem is best addressed with a classification model, which outputs either 0 („the event will not occur”) or 1 („the event will occur”).

Supervised Machine Learning

The machine learning process is divided into supervised and unsupervised learning. Supervised learning involves training a model using a specific dataset (the training set) and then applying the trained algorithm to another dataset (the test set) to assess the model’s performance. The purpose of using a test set is to evaluate how well the model performs on new, previously unseen data, known as validation.

Unsupervised learning does not involve model testing. Here, the model is trained on the primary dataset without any subsequent testing. An example of unsupervised learning is clustering, which involves grouping elements in a dataset.

Regularization in Machine Learning Models

Every predictive model based on historical data should ideally be developed using supervised learning.

Dividing the Dataset into Training and Test Sets

In supervised learning, we must split our historical data into a training set and a test set. Suppose we have weather data from 2020 to 2022, which we will use for training, while 2023 data will serve as our test set. The sequence of dates must be preserved because we are working with time series data, which may exhibit patterns like gusts of wind shortly before rain.

Dividing historical data into training and test sets enables model performance evaluation. With a working model trained on data from 2020-2022, we can introduce 2023 test data to assess model quality. If our trained classification model performs well on training data by accurately predicting rain 80

Overfitting
Overfitting occurs when a model performs well on the training data but fails to generalize to new data, indicating the model may have learned patterns specific to the training period that do not exist in the new data. Overfitting is often an issue in complex models like those based on entropy or neural networks, while simpler linear or logistic regression models are less prone to it. Overfitting is more common in models with a large number of dimensions, known as the „curse of dimensionality.” To address overfitting, it’s essential to perform a statistical comparison of test and training datasets.

What is Regularization?
Regularization aims to improve the model’s predictive accuracy by reducing its sensitivity, or complexity, making the model less likely to pick up on subtle, minor patterns in the training data—referred to as variance error. The process of reducing variance is known as regularization, where irrelevant features are either removed or synthesized into simpler forms.

Regularization in Neural Networks

Specific algorithms, like Ridge Regression (L2) and Lasso Regression (L1), serve as regularization methods in machine learning. In Ridge Regression, regularization is achieved by squaring the penalty coefficient, whereas in Lasso Regression, the penalty coefficient is absolute, allowing for negative values. Another, less universally accepted method is DropOut, which randomly deactivates a fraction of neurons during training. By making the network simpler, DropOut can reduce overfitting, although it lacks reproducibility due to its random selection of neurons.

In neural networks, regularization is akin to neuron penalization. Instead of removing variables or manually creating new ones, a kind of penalty is imposed on the neural network to limit its complexity. For example, signals such as atmospheric pressure, wind speed, temperature, and season are inputs for neurons in a network. When a signal is reduced below a certain threshold due to penalization, it may not pass through the network, thereby simplifying the model’s „thought process.”

In essence, regularization is about making the model interpret phenomena in a simpler way, reducing the likelihood of the model overfitting to noise in the data. This method, known as dimensionality reduction, helps address the „curse of dimensionality” and can be achieved by removing less significant variables or by modifying signal values to prevent minor patterns from unduly influencing the model.

Wojciech Moszczyński
Graduate of the Department of Econometrics and Statistics at Nicolaus Copernicus University in Toruń, specializing in econometrics, finance, data science, and management accounting. He specializes in optimizing production and logistics processes, and has conducted research in AI development and applications. He is actively engaged in popularizing machine learning and data science in business environments.

Artykuł Regularization in machine learning models pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> Export of EU-banned herbicides and pesticides outside the EU: How Europe legally pollutes other continents (Part 2) https://sigmaquality.pl/my-publications/export-of-eu-banned-herbicides-and-pesticides-outside-the-eu-how-europe-legally-pollutes-other-continents-part-2/ Sat, 02 Nov 2024 08:15:46 +0000 https://sigmaquality.pl/?p=8370 Export of Herbicides and Pesticides Banned in the European Union Beyond Europe How Europe Legally Poisons Other Continents (Part 2) Chemical corporations that have been [...]

Artykuł Export of EU-banned herbicides and pesticides outside the EU: How Europe legally pollutes other continents (Part 2) pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]> W-MOSZCZYNSKI PS 1-24

Export of Herbicides and Pesticides Banned in the European Union Beyond Europe
How Europe Legally Poisons Other Continents (Part 2)

Chemical corporations that have been forced to comply with pesticide sales bans are seeking markets outside the European Union. They are finding them in Third World countries, where there is no administration capable of protecting the consumer market, the environment, or enforcing health protection standards during spraying. In these countries, the focus is mainly on reducing the short-term cost of agricultural production and addressing hunger in the short term.

Position of Chemical Corporations

European companies exporting banned pesticides and herbicides are fully aware of the negative impact of these products on human health and life, as well as on the degradation of local natural environments. In response to moral accusations from organizations defending human health and natural resources, pesticide manufacturers use four main arguments.

These arguments are inherently cynical and clearly do not reflect the actual situation. Despite their falsity, they are loudly used as a kind of rational voice of large, respectable entities against the emotional demands of small, vocal social organizations.

Argument One: The Pesticides Sold Are Safe
Manufacturers claim that their products are safe. Corporations declare their efforts to reduce the risks associated with using harmful pesticides. They point to their own research, conducted over many years, aimed at this goal. At the same time, chemical companies ignore numerous reports of high numbers of fatalities, widespread illnesses, and massive environmental damage caused by the hazardous substances they sell.

Argument Two: Respect for the Authorities of Individual Countries
Pesticide exporters respect the laws of the countries to which their products are directed. Manufacturers argue that each country has the sovereign right to decide which pesticides are best for its farmers. At the same time, they ignore the fact that Third World countries often lack a sufficiently developed government administration and institutions to control the flow of goods within their territories. These countries are often helpless in enforcing the use of protective measures necessary when using harmful substances. The costs of personal protective equipment (PPE) are often much higher than the cost of the spraying itself. Because chemicals do not have an immediate effect on farmers’ health, safety measures are rarely used in everyday work. Representatives of a German chemical corporation claim that local authorities receive „reliable and detailed information about the proposed plant protection products. Local authorities can then decide for themselves whether the substances should be allowed on the local market.” Manufacturers ironically emphasize respecting local laws and decisions of local governments, overlooking the well-organized and wealthy pesticide lobbying groups that aggressively corrupt politicians and journalists in these countries and finance lawsuits against social organizations protesting the destruction of local resources.

Argument Three: Pesticides Are Suitable for Countries Outside Europe
Pesticide producers argue that their products are not sold in Europe because they are better suited to countries outside Europe. In making this claim, they completely ignore the fact that these plant protection products were once widely used in Europe but are now strictly banned there.

Interestingly, manufacturers completely ignore the primary reason behind the export of these harmful substances outside the EU, which is economic. Aggressive pesticides and herbicides significantly reduce the labor required for cultivation. Where many farm workers were once needed to weed and manually remove pests, spraying now effectively removes weeds, fungi, or harmful insects from the field. As a result, food production becomes cheaper, making it easier to export to other countries. These plant protection products increase agricultural production volume by eliminating factors that cause crop loss. This is crucial for African countries struggling with hunger. In the short term, using harmful chemicals seems economically advantageous. Thus, the arguments of large chemical corporations serve as a sort of mantra, far from the truth but repeated enough to somewhat drown out the protests of non-governmental organizations.

Meanwhile, according to UN estimates, pesticide poisoning causes approximately 200,000 deaths annually in developing countries.

Argument Four: Others Could Capture Our Markets
Another argument commonly used by chemical corporations is that pesticides and herbicides are essential for the development of local agricultural economies. If plant protection products are not exported from Europe, Europe will suffer, while Third World countries will import harmful chemicals from other countries.

At the same time, corporations argue that maintaining a presence in these countries will, in the future, improve the overall situation in local agriculture and support the EU’s “green diplomacy” aimed at achieving “more diversified food systems worldwide.”

Wealthy countries allow the export of plant protection products banned in the EU to countries that are not equipped to manage the risks associated with their use. According to UN organizations, these wealthy countries thereby violate their obligations under international human rights law.

Grassroots Initiative of European Courts

I am far from conspiracy theories that claim powerful, wealthy, and numerous chemical industry lobbyists effectively influence the operations of European offices. However, it is hard to shake the impression that the European Commission does not adequately respond to the dire threats posed by exporting chemicals to the Third World.

Chemical corporations also readily use legal avenues, which they intend as a defense against social organizations, which they claim are trying to “restrict economic freedom.” However, in 2022, French judges dismissed a lawsuit filed by one of the leading pesticide manufacturers. They ruled that restricting economic freedom is socially justified in the face of potential harm to human health and environmental destruction.

Based on this ruling, France banned the export of substances prohibited in the European Union in 2022. Following France’s lead, the Belgian government also imposed a total ban on the export of harmful substances outside the EU in 2023.

Chemical Corporations Circumvent Local Export Bans

In response, the export of hazardous pesticides from France and Belgium was moved to Germany. The production and export levels of French chemical plants remained unchanged. Since 2021, there has also been a ban on the export of the five most dangerous plant protection products from Switzerland, which is not a member of the EU.

Interestingly, chemicals do not even need to be transported to Germany for further export outside the EU. Filling out export documents for a German subsidiary of a French manufacturer suffices. According to Laurenta Gaberella, an agricultural and food expert at the Swiss non-governmental organization Public Eye, this is one of the most significant legal loopholes allowing circumvention of export bans in certain EU countries. Public Eye, along with the English branch of Greenpeace called Unearthed, found that after France imposed export bans in 2022, Germany’s exports of hazardous plant protection products more than doubled. According to the European Chemicals Agency (ECHA), over 18,000 tons of banned pesticides were exported from Germany in 2022, nearly twice as much as in 2021. Some hope came from the German government, which announced in September 2023 its intention to introduce a ban on the export of banned pesticides. Unfortunately, as long as the EU is not managed as a single export market, there will always be loopholes allowing individual countries’ export bans to be circumvented.

Exporters Change Product Signatures

Another loophole in European law is the export of harmful pesticides as pure, active chemical ingredients, which can then be combined in countries outside the EU. Slight modifications to the chemical composition or the creation of a concentrate may justify selling the substance under a different trade name. Public Eye and Unearthed estimate that approximately 20

The legal solution adopted by the French government allows such practices to circumvent export bans. However, the 2023 law passed by the Belgian government (in the form of a royal decree) comprehensively bans the export of both finished products and pure chemicals, as well as concentrates.

The Belgian export ban also covers all possible uses of chemical substances. In the case of Swiss and French solutions, export bans primarily apply to products intended for agricultural use. Declaring that the chemicals are for non-agricultural use effectively bypasses all export bans.

Protection of Jobs and Concern for the Economic Situation of Chemical Corporations

Chemical corporations’ PR agencies try to mitigate the legal effects of export bans. This year, representatives of the French branches of Bayer, Italy’s Syngenta, and BASF publicly stated that the regulations adopted by the French government threaten 2,700 jobs.

A similar defense by lobbyists occurred a few years ago when the Polish government began to declare its intention to ban fur animal breeding in Poland. A vital tool in the European Commission’s arsenal is granting main importer status to non-EU countries. If the EU decides to use this tool, countries using harmful chemicals will be excluded from the EU’s food export market. Ukraine, which extensively uses chemicals banned in Europe in its agriculture, may be the first such country in the future.

Wojciech Moszczyński
Graduate of the Department of Econometrics and Statistics at the Nicolaus Copernicus University in Toruń, specializing in econometrics, finance, data science, and management accounting. He specializes in optimizing production and logistics processes. He conducts research in AI development and application and has been involved in popularizing machine learning and data science in business environments.

Artykuł Export of EU-banned herbicides and pesticides outside the EU: How Europe legally pollutes other continents (Part 2) pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>