The interpretations of sensitivity, specificity, precision, recall, AUC, ROC and AUC-PR

Sensitivity, specificity, precision, recall, AUC, ROC and AUC-PR

Posted by Chunfu Shawn on 2023/05/12
Last Updated by Chunfu Shawn on 8/1/2023

1. Confusion Matrix

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix is a table that is used to define the performance of a classification algorithm. It has two rows and two columns that reports the number of true positivesfalse negativesfalse positives, and true negatives.

Actual condition Predicted positive Predicted negative
positive a (True positive) b (False negative)
negative c (False positive) d (True negative)

2. Sensitivity, specificity, precision, recall and accuracy

This is a summary table about these five indices.

Actual condition Predicted positive Predicted negative
positive a (True Positive) b (False Negative) Sensitivity/TPR: a/(a+b) Recall: a/(a+b)
negative c (False Positive) d (True Negative) Specificity/FPR: d/(c+d)
Precision: a/(a+c) Accuracy: a+d/(a+b+c+d)

(1) Precision, recall and accuracy

Precision, recall and accuracy are more used in machine learning

  • , which means out of all the examples that predicted as positive, how many are really positive?

  • , which means out of all the positive examples, how many are predicted as positive?

  • , which means out of all examples, how many are predicted trully?

(2) Sensitivity and specificity

Sensitivity and specificity are used in medical statistics more than machine learning.

figure2

  • , which means out of all the people that have the disease, how many got positive test results? If we define a positive example as “person that has a disease” we can see that Recall and Sensitivity are the same, but Precision and Specificity are different. Precision is also called PPV (Positive Predictive Value).

  • , which means out of all the people that do not have the disease, how many got negative results?

3. ROC and AUC

(1) ROC curve

An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:

  • True Positive Rate
  • False Positive Rate

True Positive Rate (TPR) is a synonym for sensitivity and defined as .

False Positive Rate (FPR) is the same numerical value as 1- specificity and defined as .

An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve.

figure3

To compute the points in an ROC curve, we could evaluate a logistic regression model many times with different classification thresholds, but this would be inefficient. Fortunately, there's an efficient, sorting-based algorithm that can provide this information for us, called AUC.

(2) AUC: Area Under the ROC Curve

AUC stands for "Area under the ROC Curve." That is, AUC measures the entire two-dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1).

figure4

AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example. For example, given the following examples, which are arranged from left to right in ascending order of logistic regression predictions:

AUC represents the probability that a random positive (green) example is positioned to the right of a random negative (red) example.

AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.

AUC is desirable for the following two reasons:

  • AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values.
  • AUC is classification-threshold-invariant. It measures the quality of the model's predictions irrespective of what classification threshold is chosen.

However, both these reasons come with caveats, which may limit the usefulness of AUC in certain use cases:

  • Scale invariance is not always desirable. For example, sometimes we really do need well calibrated probability outputs, and AUC won’t tell us about that.
  • Classification-threshold invariance is not always desirable. In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization.

4. PRC, AP and AUC-PR

(1)PRC: Precision-Recall Curve:

The precision-recall curve shows the tradeoff between precision and recalls for different thresholds. It is often used in situations where classes are heavily imbalanced. For example, if an observation is predicted to belong to the positive class at probability > 0.5, it is labeled as positive. However, we could choose any probability threshold between 0 and 1. A precision-recall curve helps to visualize how threshold affects classifier performance.

figure6

(2)AP: Average Precision

Average Precision summarizes the PR curve into a single metric as the weighted mean of the precision achieved at each threshold value.

where and are the precision and recall at the threshold.

(2)AUC-PR: Area Under the Curve-Precision Recall

AUC-PR stands for Area Under the Curve-Precision Recall, and it is the trapezoidal area under the plot. AP and AUC-PR are similar ways to summarize the PR curve into a single metric.

A high AP or AUC represents the high precision and high recall for different thresholds. The value of AP/AUC fluctuates between 1 (ideal model) and 0 (worst model).

Reference:

Confusion Matrix-Wiki

Classification: ROC Curve and AUC

Precision-Recall Curve