The ECG Quiz User Manual

Site Tools


Action disabled: backlink
resultspage

Understanding the Results

The ECG Quiz is designed to assess your performance in a meaningful, clinically realistic way. Because your interpretation may contain multiple diagnoses and the scoring algorithm is hierarchical, a single “percentage correct” score is not sufficient to describe how well you did.

Instead, The ECG Quiz summarizes your performance using three standard metrics:

Metric Meaning
Precision How many of the diagnoses you selected were correct
Recall How many of the correct diagnoses you successfully identified
F1 A combined measure that balances Precision and Recall

These metrics are widely used in diagnostic testing and machine learning. For additional background, see the article on Precision and recall (Wikipedia).


Metric Overview

Metric What It Measures Core Formula
Precision Of all the diagnoses you selected, what fraction were correct? TP/(TP+FP)
Recall Of all the correct diagnoses, what fraction did you select? TP/(TP+FN)
F1 Score A single score that balances Precision and Recall (harmonic mean). 2*[(PR)/(P+R)] or (2*TP)/(2*TP+FP+FN)

Where TP, FP, and FN are defined in the Definitions section below.


Definitions

Graphical Representation of the Definitions
Fig. 1: Figure 1 — Graphical representation of Precision and Recall.
Adapted from Wikipedia.

When grading your interpretation, each diagnosis you provide is compared to the correct diagnoses for the ECG. Both your diagnoses and the correct diagnoses are classified into the following categories:

Term Definition
True Positive (TP) A diagnosis you entered that *is* in the list of correct diagnoses.
False Positive (FP) A diagnosis you entered that is not in the list of correct diagnoses.
False Negative (FN) A diagnosis you did not enter that *is* in the list of correct diagnoses.
True Negative (TN) A diagnosis you did not enter that is also not in the list of correct diagnoses.

Notice
In The ECG Quiz, only omitted Must diagnoses are classified as False Negatives. Omitted Should diagnoses are not counted. See the Scoring Algorithm for more information on Must/Should handling and hierarchical scoring.

Precision

Precision provides a numeric measure of how many of the diagnoses you chose were correct. It answers the question:

“Of everything I selected, how often was I right?”

Precision is calculated as:

Precision = TP / (TP + FP)

Another name for Precision in diagnostic testing is Positive Predictive Value.


Recall

Recall provides a numeric measure of how many of the correct diagnoses you successfully selected. It answers the question:

“Of everything that was correct, how much did I find?”

Recall is calculated as:

Recall = TP / (TP + FN)

Another name for Recall is Sensitivity.


F1

The F1 score is a single number that combines both Precision and Recall. It is the harmonic mean of Precision and Recall, so it is high only when both are reasonably high.

F1 = 2 * [(Precision * Recall) / (Precision + Recall)]

Using TP, FP, and FN directly:

F1 = (2 * TP) / (2 * TP + FP + FN)


Illustrative Examples

Graphical Representation of the Truth
Fig. 2: Figure 2 — Graphical representation of the “truth” set of diagnoses.

We use these statistics to answer two related questions:

1. **Did you select the correct diagnoses?** (Recall)  
2. **Are the diagnoses you selected correct?** (Precision)

To make this concrete, consider a simplified scenario:

  • The ECG has 4 correct diagnoses.
  • The selection list contains 10 possible diagnoses.
  • (In reality, The ECG Quiz list includes over 200 diagnoses.)

The figures and examples below demonstrate how Precision, Recall, and F1 change with different strategies.


Example #1 — “Select Everything” Strategy

Graphical Representation of the Results for Example #1
Fig. 3: Figure 3 — Results for Example #1.

In this example, the user selects all 10 diagnoses.

  • True Positives (TP): 4 (all 4 correct diagnoses included)
  • False Positives (FP): 6 (6 incorrect diagnoses selected)
  • False Negatives (FN): 0 (no correct diagnoses missed)

Precision = 4 / (4 + 6) = 0.40

Recall = 4 / (4 + 0) = 1.00

F1 = 2 * [(1.00 * 0.40) / (1.00 + 0.40)] ≈ 0.57

Interpretation: This strategy maximizes Recall (1.00) by ensuring no correct diagnosis is missed, but it does so at the cost of many False Positives, giving a relatively low Precision (0.40). The F1 score (0.57) reflects this imbalance.


Example #2 — “Play It Safe” Strategy

Graphical Representation of the Results for Example #2
Fig. 4: Figure 4 — Results for Example #2.

Here, the user is very conservative and selects only one diagnosis that they are confident is correct.

  • True Positives (TP): 1
  • False Positives (FP): 0
  • False Negatives (FN): 3 (the other 3 correct diagnoses are missed)

Precision = 1 / (1 + 0) = 1.00

Recall = 1 / (1 + 3) = 0.25

F1 = 2 * [(0.25 * 1.00) / (0.25 + 1.00)] = 0.40

Interpretation: This approach maximizes Precision (1.00) at the cost of missing many true diagnoses (Recall = 0.25). The F1 score (0.40) reveals that overall performance is still limited, despite perfect Precision.


Example #3 — “Balanced Effort” Strategy

Graphical Representation of the Results for Example #3
Fig. 5: Figure 5 — Results for Example #3.

In this scenario, the user selects four diagnoses, of which two are correct and two are incorrect.

  • True Positives (TP): 2
  • False Positives (FP): 2
  • False Negatives (FN): 2 (two correct diagnoses missed)

Precision = 2 / (2 + 2) = 0.50

Recall = 2 / (2 + 2) = 0.50

F1 = 2 * [(0.50 * 0.50) / (0.50 + 0.50)] = 0.50

Interpretation: This example shows a moderately balanced attempt: some correct diagnoses are captured, but some are missed and some incorrect ones are chosen. Both Precision and Recall are 0.50, and the F1 score reflects this middle-ground performance.


Why Is This Important?

Imagine working as an ECG technician at a remote cardiac monitoring facility that receives tens of thousands of ECG transmissions daily. Your job is to identify cases of atrial fibrillation:

  • If you mark every ECG as atrial fibrillation, you will never miss a case (Recall = 1.00), but nearly all of your positive calls will be wrong (very low Precision and F1).
  • If you only flag ECGs that are “perfect” atrial fibrillation (e.g., continuous AF throughout), your Precision might be 1.00, but you will miss many true AF cases that are intermittent or mixed with normal sinus rhythm (low Recall and F1).

The goal in clinical practice — and in The ECG Quiz — is to balance these two extremes:

  • Avoid missing important diagnoses (high Recall)
  • Avoid over-calling diagnoses that are not present (high Precision)

The F1 score gives you a single number that summarizes this balance. By reviewing your Precision, Recall, and F1 for each ECG, you can see:

  • Whether you tend to over-call findings (low Precision)
  • Whether you tend to under-call findings (low Recall)
  • How your performance changes as your skill level improves over time.
resultspage.txt · Last modified: 2025/11/29 22:42 by dtong

Page Tools