Biostatistics Formulas
Part 1: The formulas
Part 2: Explanation of the formulas
Formulas & Tables
Disease + | Disease – | |||
Test + | true positive (TP) a | False negative (FN) b | a+b | PPV = a/(a+b) |
Test – | False positive (FP) c | True negative (TN) d | c+d | NPV = d/(c+d) |
a+c | b+d | total | ||
SN = a/(a+c) | SP = d/(b+d) |
For Diagnostic tests | Formula |
Sensitivity, SN (true-positive rate) | = TP/ (TP +FN) which is a/(a+c) |
Specificity, SP (true-negative rate) | = TN/ (TN + FP) which is d/(b+d) |
Positive predictive value (PPV) | = TP / (TP + FP), which is a/(a+b) |
Negative predictive value (NPV) | = TN / (TN + FN), which is d/(c+d) |
Incidence rate | = # of new cases / population at risk. *for a specific time period |
Prevalence | = # of existing cases ( / population at risk. *for current period of time |
Accuracy | = (TP+TN)/total. The probability of correctly identifying someone as TN or TP. |
4×4 Contingency table
Event + | Event – | ||
Exposed or treatment group | a | b | a+b |
Unexposed or control group | c | d | c+d |
a+c | b+d | total |
For Quantifying Risk & Therapeutic Efficacy | Formula |
Odds ratio (OR) | ad/ bc |
Relative risk (RR) | [a/(a+b)] / [c/(c+d)] i.e. treatment or exposed event rate/ control or unexposed event rate |
Relative risk reduction (RRR) | 1 – RR. ARR/ control rate |
Absolute risk (AR) | [a/(a+b)] – [c/(c+d)] |
Absolute risk reduction (ARR) | control rate – treatment rate |
Absolute risk increase (ARI) | treatment rate – control rate |
Absolute risk percent (ARP) | (RR-1)/RR. Also = (ARI/treatment rate) x 100 |
Number needed to treat (NNT) | 1/ ARR |
Number needed to harm (NNH) | 1/ ARI |
Explanation of Formulas
Sensitivity
The quality of a diagnostic test can be determined by its sensitivity and specificity.
Sensitivity tells us how well a test identifies people with the disease. A highly sensitive test makes a great screening tool because it will detect most of the affected individuals in a population. The sensitivity of a test can also be referred to as the true positive rate.
Specificity
Specificity tells us how well a test identifies people without the disease. A highly specific test will more effectively rule out those who don’t have a particular disease. The specificity of a test can also be referred to as the true negative rate.
Prevalence
Prevalence is the amount of people in the population who are disease positive over the total population.
Prevalence (P) & Predictive value
PPV = positive predictive value, NPV = negative predictive value
↑P = ↑PPV and ↓NPV
↓P = ↓PPV and ↑NPV
The quality of the relationship between a particular exposure and an outcome can be determined by looking at the relative risk and the odds ratio. These two measures are used in evaluating case-control studies and cohort studies.
Odds ratio
The Odds ratio tells us the odds of an outcome (e.g. disease) occurring in the exposed group compared to it occurring in the non-exposed group. The odds ratio is typically used to evaluate case control studies. Note: ossa od disease in the exposed group = a/b, meaning exposed with disease (a) divided by exposed without disease. Odds of disease in the non-exposed group = c/d. We do this, because odds is the comparison of an outcome occurring vs. it not occurring.
Relative Risk
The Relative risk tells us risk of an outcome (e.g. disease) in the exposed group compared to non-exposed group. The relative risk is typically used to evaluate cohort studies. Note: risk in of disease in the exposed group = a/(a+b), meaning those exposed with the disease divided by all those who were exposed. Risk in the non-exposed group is c/(c+d). We do this, because risk is the chance the outcome of interest will occur compared to the chance of all possible outcomes occurring.
The odds ratio may approximate the relative risk
It is possible for the relative risk to approach the odds ratio i.e. RR ~ OR.
RR = [a/(a+b)] / [c/(c+d)] = (a/b) / (c/d) = ad/bc = OR. If a << b and c << d.
This occurs when a disease is rare i.e. when there is low prevalence of disease.
What does the relative risk mean?
RR = 1 means that there is no relationship between the exposure and the outcome (e.g. disease).
RR > 1 means that there is a positive relationship between the exposure and the outcome (e.g. disease). This means that the exposure is associated with an increased risk of disease.
RR < 1 means that there is a negative relationship between the exposure and the outcome. In this case, the exposure is associated with a decreased risk of disease.
Test Cut-off values
Changing the cut-off value needed for a test to be positive, HbA1c >6.5 is positive, will change many measures (TN, TP, SN, SP, PPV, NPV). Many questions of boards will ask which direction theses measures will change based on whether the cut-off is lowered or increased.
Higher cut-off leads to increased SP, TN, and NPV i.e. all the “negatives” because a higher cut-off will result in more people without disease testing positive. Where as lower cut-off leads to increased SN, TP, and PPV i.e. all the “positives”, because a lower cut-off value will result in more of the positive test results being due to those who really do have disease.
For example, if the blood pressure cut-off to diagnose hypertension is reduced to 120/80 then we will catch everyone with hypertension, i.e. increase TP, PPV and SN. However, we will have many more false positives, so higher TN, SP and NPV.
Disease + | Disease – | |||
Test + | true positive (TP) a | False negative (FN) b | a+b | PPV = a/(a+b) |
Test – | False positive (FP) c | True negative (TN) d | c+d | NPV = d/(c+d) |
a+c | b+d | total | ||
SN = a/(a+c) | SP = d/(b+d) |
Using the table, if a is increased i.e. more positive tests because the cut-off for a positive test is lowered, then ↑TP (=a) and ↑PPV (= a/(a+b)) and ↑SN (=a/(a+c)).
Similarly, if the cut-off value needed for a positive test is increased, then less a and more d. So, ↑TN (=d) and ↑NPV (=d/(c+d)) and ↑SP (=d/(b+d).
Receiver operative curve (ROC)
This curve shows how test SN and SP change with changing cut-off values. Here, X is a low cut-off approaching 0%, and A is the higher cut-off which approaches 100%. Specificity (SP) is highest at A, and Sensitivity (SN) is highest at X.