Differentiation between Recall and Precision explained in plain English

Easy to understand?

Differentiation between Recall and Precision seems to be easy to understand and explain. This is two easy equations, readable even for kids.

This seemingly simple measure of prediction quality is treacherous and misleading, especially for beginners.

There is that, because nobody at the beginning doesn't explain we evaluate model only under one class. No for two class only for one.

Most intuitive examples differentiate between recall and precision

In our example we made machine who recognizing apples. The "Apples" is our only one class for evaluation. Let's assume machine it is the robot arm who chooses fruits and put it in to the blender.

Arm of robot has just put into the box 7 apples and 5 oranges. Is it good estimation?

Confusion matrix

Where:

 In this example gauge "Recall" mean that: 

Program found 7 apples from all 10 apples in the box. Level o Recall is: 7/10 = 0.70

It is all.

 In this example gauge "Precision" mean that: 

Program found 7 apples and 5 oranges. We wanted to do apples juice with small part of oranges, but now we have almost half-and-half this fruits. We remember, we evaluate machine only under one class. That mean: for 12 fruit in the blender, robot arm has put only 7 apples. How calculate it? 7/(5+7) = 0,58.

Examples with apples and oranges can be converted into example with cars and trams or example with good and bad transactions. Every time we would be remembered that evaluation applies only one class. We look for women on the photos, only women. Computer found 60 women, but we know that photos with women is 100. That's mean Recall is 60/100 = 0.6. Computer sent to us 120 photos, as we remember, we need to find only women. Computer found 60 photos with women but sent to us 120 photos at all. It not works precisely! The level of "Precision" is 60/120 = 0.5.

We've been littered with photos of guys because the computer took these guys for women. Why do we need these photos?

How to reconcile Recall and Precision?

Let's back to the example with apples and oranges. If robot arm put all apples in to the box and all orange in to the box we get 10 apples and 20 oranges. We have Recall on the level of 100% (that mean: 10/10) and Precision 33% (that mean: 10/(10+20)).

We can say robot arm is very good because chooses all apples, Recall 100%. We have Precision 33%, we unable do apple juice!

Let's make another example. Robot choose and put to the box 4 apples and 2 oranges. We get Recall 4/10= 0.4 and Precision 4/(4+2) = 0.67. Quite good Precision, we can make apple juice, but we get a little juice. Most of the apples will be throw out because robot in most frequently does not recognize apples as the apples.

It is easy to see differentiation between Recall and Precision.

How to construct valuable gauge to evaluate robot arm?

Frankly speaking to create valuable gauge we will need to know what we want. If we do classification on the fraud cause, Recall will be most important. Machine can point plenty of ordinary, clear transactions as the frauds. Precision will be low but recall will be high. Bank easily explain everything, but if machine lets some of the fraud through, bank will bear the height costs. In ordinary situation we will look for compromise between Precision and Recall.

In the case where robot put all apples and all oranges to the box. We get Recall = 1.00 and Precision = 0.33.

If we make arithmetic average we get: (1+0.33)/2 = 0.67, good result of robot that we know it is not true.

We will need to do harmonic average: 2*(1*0.33/(1 +0.33)) = 0.5

We can see the robot is not good!

Ustawienie progów krzywej ROC