Comparing scores on different scales using the z-score test

MOSZCZYNSKI 8-22

In professional practice, we are often forced to compare different values or results, which are frequently presented in different scales. How should we interpret them? Each value, outcome, or result makes sense only if it can be compared with the average values of a given population. For example, knowing that a master pastry chef, Ken Takahashi from Tokyo, earns 5,540,000 yen annually tells us nothing on its own. The sum seems high, but without information about average earnings in Japan, we can’t determine if it is truly substantial. What if we want to know if our Tokyo-based pastry chef earns more than his colleague Bata Kumar from India, who earns 21,000 Indian rupees, while the average monthly earnings in India are 17,000 rupees? We could convert their earnings to dollars and see that Ken earns more, but this doesn’t tell us much. In India, one can live decently on 10,000 rupees per month. To understand if 46,000 yen monthly allows for a decent lifestyle in Japan, we need to know the average salary there.

It turns out that the average salary in Japan is 4,670,000 yen per year, meaning our Japanese pastry chef earns slightly above average. So, which pastry chef earns more relative to the average income in their country? We can easily calculate the percentage by which each of them earns more than the average in their respective countries:

Ken Takahashi: (5,540,000 – 4,420,000) / 4,420,000 = 0.253
Bata Kumar: (21,000 – 17,000) / 17,000 = 0.235

We wanted to know which pastry chef would be considered wealthier in their own country, but again, we learn little from this comparison. As we know, the average is a statistically limited measure. In India, there are both wealthy individuals and beggars who survive on the equivalent of five dollars a day. Meanwhile, in Japan, income varies significantly depending on the worker’s age. To truly determine who earns more, we must also consider the population’s standard deviation.

Intuitively, standard deviation indicates how spread out values within a category (e.g., age, inflation, fuel consumption) are around the average of that category. The smaller the deviation, the closer observations are to the mean value.

In India, the standard deviation of earnings is 6,000 rupees per month, while in Japan, it is 1,250,000 yen per year. Now, we can determine who truly earns more.

Johann Gauss’ Probability Density Graph

A normal distribution graph shows the likelihood of a particular event occurring. Almost every natural phenomenon follows a normal distribution. Only artificial events exhibit different distributions, allowing researchers to detect anomalies, external interferences in processes, and fraud.

A normal distribution has a bell shape, is symmetrical, and divides into four zones, each representing multiples of standard deviations from the mean value.

Suppose we want to know the height of the first customer entering our bakery. The average height in the town is 169 cm with a standard deviation of 16 cm. With a normal distribution, we learn that there’s a 68.3% chance someone with a height of 169 cm ±16 cm will enter. So, what’s the probability that someone with a height of 143 cm will walk in? A height of 143 cm lies below the first standard deviation.

To calculate the first standard deviation boundary, we subtract the standard deviation (16 cm) from the average height (169 cm), revealing that the limit is 153 cm. The graph shows the probability of encountering someone below 153 cm in height in our town is: 13.6% + 2.1% + 0.1% = 15.78%.

In all normal distributions, the density function is symmetric concerning the distribution’s mean value. In a normal distribution, the probability that a value (statistical feature) lies within one standard deviation of the mean is approximately 68.3%, with probabilities of 95.5% and 99.7% for distances within two and three standard deviations, respectively (the three-sigma rule).

Who Earns More?

With knowledge of a population’s average and standard deviation, we can compare values expressed in different scales or measurement methods. This enables us to, for example, determine who performed better in three prestigious pastry contests, such as the International Chocolate Awards, The World Chocolate Masters, and NY Cake Show, even though each contest uses its own scoring methods. This information can be extremely useful when choosing a candidate for a pastry chef position in a prestigious restaurant.

To determine who held a better position in the contests, we use the z-score test. This test assumes that the population (or sample) being studied follows a normal distribution. The z-score formula is simple:

It’s the difference between the observed value and the population mean μ, divided by the population’s standard deviation σ.

Returning to our pastry chefs, we substitute the data into the z-score formula:

The z-score indicates how much a value exceeds the mean. Ken Takahashi earns 0.9 standard deviations above the mean for his population, almost reaching the second quartile of the normal distribution, where only 16% of Japanese earn as much. Meanwhile, Bata Kumar earns 0.67 standard deviations above average, meaning his earnings are only slightly higher than the average in India.

Is the z-score Test Very Universal?

It might seem that such considerations have limited use in a bakery owner’s daily life. Imagine the bakery owner suspects the drivers of colluding and siphoning fuel from delivery trucks. Since they do this proportionally to the kilometers driven, the act is nearly undetectable.

After a while, the bakery owner buys three new delivery vehicles. The manufacturer specifies an average consumption of 12.5 l/100 km in city driving, with a standard deviation of 3 l/100 km.

The new vehicles start delivering bread to stores, and after some time, the owner checks the average fuel consumption, which turns out to be 14.1 l/100 km. The owner substitutes the data into the z-score formula.

It shows that the vehicles consume more than the manufacturer’s average by 0.5 standard deviations. Does this mean the drivers are cheating?

To find out, we need to set a null hypothesis, which always states that the result is not statistically different from the population. Therefore, our null hypothesis will say that our drivers are not cheating.

We also need to establish a confidence level (or confidence coefficient). Let’s assume it’s 95%. Hence, the null hypothesis states that there’s a 95% chance our drivers aren’t cheating. The alternative hypothesis implies a 5% chance they are. If the probability level is below p = (1 – 0.95), we reject the null hypothesis in favor of the alternative hypothesis.

Now we need to find the z-value in statistical tables. Fortunately, we can use a z-score calculator and input our data.

The result tells us there’s almost a 20% probability that, according to the manufacturer’s standards, fuel consumption can range from 12.5 to 14.1 l/100 km. The test also shows a 30% probability of consumption exceeding 14.1 l/100 km. This means that the excess fuel consumption level is statistically insignificant, as it exceeds the initial p-value set at (1 – 0.95).


Wojciech Moszczyński – A graduate of the Department of Econometrics and Statistics at Nicolaus Copernicus University in Toruń, specializing in econometrics, data science, and management accounting. He focuses on optimizing production and logistics processes and conducts research in artificial intelligence development and applications. He has long been involved in promoting econometrics and data science in business environments.