Understanding the Results

The ECG Quiz is designed to assess your performance in a meaningful way. However, providing a single score is challenging due to the real-life approach used for capturing your interpretation and the complexities of the scoring algorithm. To address this, the ECG Quiz provides three performance metrics: Precision, Recall, and F1. These metrics offer a quantitative evaluation of your ECG interpretation skills. Recall reflects how well you identified the correct elements, while Precision measures the accuracy of those selections. The F1 score combines these aspects to provide a balanced view of performance. For more details, you can refer to information about these metrics on Wikipedia.

Definitions

Taken from Wikipedia and modified — Fig. 1: Graphical representation of the definitions.
*Adapted from Wikipedia.*

When grading your interpretation, each diagnosis you provide is compared to the correct diagnoses for the ECG. This comparison will classify both the diagnoses you entered and the correct diagnoses into one of four categories:

True Positive (TP): A diagnosis you entered that is in the list of correct diagnoses

False Positive (FP): A diagnosis you entered that is not in the list of correct diagnoses

False Negative (FN): A diagnosis you did not enter that is in the list of correct diagnoses

True Negative (TN): A diagnosis you did not enter that is not in the list of correct diagnoses

Notice
In the ECG Quiz, only omitted Must diagnoses are classified as False Negatives. Omitted Should diagnoses are not counted. See the Scoring Algorithm for more information.

Precision

Precision provides a numeric measure of how many of the diagnoses you selected are correct. Precision is calculated as follows:

As shown in the formula, Precision is a measure that the diagnoses you select are right. Another term for Precision is Positive Predictive Value.

Recall

Recall provides a numeric measure of how many of the correct diagnoses you selected. Recall is calculated as follows:

As you can see from the formula, Recall is a measure of how well you select the right things. Another term for Recall is Sensitivity.

F1

F1 is an average measure of both Precision and Recall. F1 is calculated as follows:

Illustrative Examples

We use these statistics to answer two key questions: (1) Did you select the correct diagnoses? and (2) Are the diagnoses you selected correct? Let’s look at some examples to illustrate what these statistics represent.

Consider a simplified scenario where an ECG has 4 correct diagnoses, and the selection list contains 10 possible diagnoses. For context, the ECG Quiz’s diagnosis selection list includes over 200 possible diagnoses.

Example #1

In this example, the user selects all 10 possible diagnoses, as shown in Figure 3. The statistics are calculated as follows:

True Positives (TP): The user correctly identifies 4 correct diagnoses (4 TP).
False Positives (FP): The user selects 6 incorrect diagnoses (6 FP).
False Negatives (FN): The user does not miss any correct diagnoses (0 FN).

This example illustrates a strategy focused on maximizing sensitivity, albeit at the expense of precision. By selecting all possible diagnoses, the user achieves perfect recall (1.00) because all 4 True Positive diagnoses are included, ensuring no False Negatives. However, this approach results in lower precision (0.40), as the user also selects 6 False Positives.

Example #2

In this example, the user aims to select only correct diagnoses and confidently chooses one accurate diagnosis, as shown in Figure 4. The statistics are calculated as follows:

True Positives (TP): The user correctly identifies 1 diagnosis as accurate (1 TP).
False Positives (FP): No incorrect diagnoses are selected (0 FP).
False Negatives (FN): The user misses 3 other correct diagnoses (3 FN).

This example illustrates the user's cautious approach, prioritizing precision over sensitivity. The user achieves perfect precision (1.00) because, by selecting only one diagnosis they were certain about, they avoided any False Positives. However, recall is lower because the user failed to identify 3 of the True Diagnoses, resulting in missed opportunities to capture the full set of correct answers.

Example #3

In this example, the user makes their best effort and selects 2 correct and 2 incorrect diagnoses, as shown in Figure 5. The resulting statistics are calculated as follows:

True Positives (TP): The user correctly identifies 2 correct diagnoses (2 TP).
False Positives (FP): The user selects 2 incorrect diagnoses (2 FP).
False Negatives (FN): The user missed 2 correct diagnoses (2 FN).

This example illustrates a balanced but imperfect attempt, where the user successfully identifies some correct diagnoses and selects some incorrect ones, resulting in moderate sensitivity and precision.

Why is This Important?

Imagine working as an ECG Technician at a remote cardiac monitoring facility that receives tens of thousands of ECG event transmissions daily. You are tasked to review these transmissions to identify cases of atrial fibrillation. If you mark all incoming transmissions as containing atrial fibrillation, you will avoid False Negatives, ensuring no cases are missed. This approach would result in a perfect Recall score of 1.00 but significantly lower Precision and F1 scores due to the large number of False Positives.

On the other hand, if you choose to be highly selective and only flag ECGs showing 100% atrial fibrillation, excluding cases with intermittent occurrences (e.g., following normal sinus rhythm), every transmission you flag would indeed contain atrial fibrillation, resulting in a perfect Precision score of 1.00. However, this selectivity would lead to many False Negatives, as cases with partial atrial fibrillation would be missed, lowering Recall and F1 scores.

ECG Quiz Documentation

Table of Contents

Understanding the Results

Definitions

Precision

Recall

F1