Next Article in Journal
Baseline 18F-FDG PET/CT Radiomics in Classical Hodgkin’s Lymphoma: The Predictive Role of the Largest and the Hottest Lesions
Previous Article in Journal
Ultra-High Frequency Ultrasound Imaging of Bowel Wall in Hirschsprung’s Disease—Correlation and Agreement Analyses of Histoanatomy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Guidelines

An Elaboration on Sample Size Planning for Performing a One-Sample Sensitivity and Specificity Analysis by Basing on Calculations on a Specified 95% Confidence Interval Width

by
Mohamad Adam Bujang
Clinical Research Centre, Sarawak General Hospital, Ministry of Health Malaysia, Kuching 93586, Malaysia
Diagnostics 2023, 13(8), 1390; https://doi.org/10.3390/diagnostics13081390
Submission received: 11 January 2023 / Revised: 6 February 2023 / Accepted: 16 February 2023 / Published: 11 April 2023
(This article belongs to the Section Clinical Laboratory Medicine)

Abstract

:
Sample size calculation based on a specified width of 95% confidence interval will offer researchers the freedom to set the level of accuracy of the statistics that they aim to achieve for a particular study. This paper provides a description of the general conceptual context for performing sensitivity and specificity analysis. Subsequently, sample size tables for sensitivity and specificity analysis based on a specified 95% confidence interval width is then provided. Such recommendations for sample size planning are provided based on two different scenarios: one for a diagnostic purpose and another for a screening purpose. Further discussion on all the other relevant considerations for the determination of a minimum sample size requirement and on how to draft the sample size statement for performing sensitivity and specificity analysis are also provided.

1. Introduction

Diagnostic research is one of the most popular types of research in the medical field. It is a study that aims to quantify the accuracy of a test’s added contribution beyond the test results readily available to the physician or researcher in determining the presence or absence of a particular disease or to predict the two distinct categories of patients such as poor health or good health [1,2,3,4,5,6,7]. This type of study is very important for efficiently identifying and offering the appropriate medical management to the right patient [8,9,10].
Research design is one of the important factors that will define the success of diagnostic research, and one of the necessary considerations for any research design is to conduct proper sample size planning. Before calculating the required sample size, a researcher will need to first understand the overall concept, the underpinning assumptions, and all the measurable parameters for a diagnostic test. Figure 1 illustrates a common scenario for diagnostic research. In this example, the researcher aims to determine the accuracy of a particular screening test to determine the serum level of a particular biochemical marker for detecting colorectal cancer in a patient. The outcome of a diagnostic test must be objectively evaluated against a definitive measurement provided by a gold standard test such as in this case from a biopsy test.
The True Positive (TP) cases are referring to those cases that actually have a positive diagnosis from among a group of positive cases detected by the test, whereas the True Negative (TN) cases are referring to those cases that actually do not have a positive diagnosis from among a group of negative cases detected by the test. This means that the sensitivity of a diagnostic or screening test is an assessment of how well it is able to detect the True Positive (TP) cases (e.g., patients with colorectal cancer) as compared to that of the gold standard technique (i.e., performing a biopsy from the organ itself); whereas, the specificity of a diagnostic or screening test is an assessment of how well it is able to detect the True Negative (TN) cases (e.g., patients without colorectal cancer) as compared to that of the gold standard technique (i.e., performing a biopsy from the organ itself). In other words, a diagnostic or screening test with a perfect score in having both the sensitivity and specificity values of 100%, respectively, can only be achieved when the False Positive (FP) and False Negative (FN) are both zero.
Based on the above formula provided, the sensitivity and specificity of the test are calculated to be 87.8% and 83.3%, respectively. The Positive Predicted Value (PPV) is the proportion of people with a positive test result who actually have the disease and Negative Predicted Value (NPV) is the proportion of those with a negative result who do not have the disease. In this example, the values of PPV and NPV are then calculated to be 90.9% and 78.1%, respectively. Overall, the test has good sensitivity and specificity. Ideally, most researchers will always aim to achieve a perfect accuracy, which is a performance as good as the gold standard. However, this can rarely be achieved since a particular screening test that has been invented or developed will usually be far cheaper, offer a faster method of detection, and be more convenient and user-friendly in its procedures. Thus, most researchers will usually afford some allowances for its accuracy that are attributable to chance or random error [8,9,10].
Normally, there are three possible conclusions that can be drawn from diagnostic research. First, the test is both sensitive and specific and thus suitable for use as a diagnostic test or marker [1,2,3,4,5]. Second, the test can only be suitable for use as a screening tool since the test or marker is high in either its sensitivity or specificity (but not both) but is low in the other measures [11,12,13,14,15,16,17]. Lastly, the test is neither sensitive nor specific and perhaps this is the worst-case scenario in diagnostic research, which renders it not being suitable for use in either the diagnosis or screening of a disease [18,19,20]. The ideal result is to obtain an excellent measure for both its sensitivity and specificity or at least in one of its two evaluated measures (i.e., sensitivity or specificity) so that the test can still be deemed acceptable for use as a screening tool at a bare minimum.
This paper adopts this position further by proposing that a careful evaluation of the actual purpose of diagnostic research (for either diagnosis or screening of a disease) is necessary because both purposes are not the same and each will require a different approach in its sample size planning. Many previous studies have provided the detailed estimation or calculation of sample size requirement for the purpose of sample size planning when conducting diagnostic tests as presented in Table 1. Although there are already numerous published papers related to sample size planning for performing sensitivity and specificity tests, it is still necessary to provide further detailed step-by-step guidance of how to apply the relevant knowledge to ensure the researchers do not inadvertently omit accounting for any other pertinent considerations during the sample size planning for conducting diagnostic research. Furthermore, the sample size determination must also be guided by the specific study objective that is the aim of a particular diagnostic test (and also its expected level of accuracy).
Therefore, this study shall further extend the aim for determining the necessary sample size requirement in these situations by discussing the detailed step-by-step procedures of sample size planning for diagnostic research through the incorporation of a specified width of both sensitivity and specificity values that are based on a 95% confidence interval. The advantage of using the width as a proxy measure for its effect size is to enable the researcher to impose a pre-specified limit for its sensitivity and specificity values based on a 95% confidence interval that the researcher initially aims to achieve. By doing so, a list of sample size tables will be compiled to guide the researcher by facilitating them to set the sample size requirement by quickly conducting the necessary sample size planning without the need to understand the complexity of the computations involved.

2. Methods

The sample size calculations were determined by basing on two-sided confidence intervals for conducting a one-sample sensitivity and specificity analysis [35]. The formula for calculating the binomial confidence intervals was derived from an ‘exact’ method called the Clopper–Pearson interval in which these intervals are being calculated by directly basing them on the cumulative probabilities of the actual binomial distribution [38]. For all these calculations, the alpha is set at 0.05, confidence interval width is set at 0.1 or 0.2, and the prevalence is set at 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, or 0.9.
For a study design that aims for diagnostic purposes, the values of both sensitivity and specificity are set at 0.7, 0.8, 0.9, and 0.95, respectively. A diagnostic test or marker should ideally have excellent levels of both sensitivity and specificity. In this paper, the minimum value of sensitivity and specificity is set at 0.70. For ease of interpretation, the values of both sensitivity and specificity of 0.95 can be regarded as having excellent accuracy, 0.90 as having nearly excellent accuracy, 0.80 as good accuracy, and 0.70 as fairly good accuracy.
For a study design that aims for screening purposes that place a particular emphasis on sensitivity, the pre-specified sensitivity values are set at 0.95, 0.90, 0.80, and 0.70 while the same for specificity is set at 0.5. Meanwhile, for a screening with a particular emphasis on specificity, the pre-specified specificity values are set at 0.95, 0.90, 0.80, and 0.70 while the same for sensitivity is set at 0.5. To develop a screening strategy, it might be necessary for the researchers to have to sacrifice either the sensitivity or specificity. In the case where a researcher has initially planned to ensure that a study must have a high level of sensitivity, the minimum setting for its sensitivity will be at least 0.7 while the minimum setting for its specificity is 0.5. All the calculations were performed by using Power and Sample Size Software (PASS) (PASS 2020 Power Analysis and Sample Size Software (2020). NCSS, LLC. Kaysville, UT, USA, ncss.com/software/pass).

3. Results

There are three main factors that can potentially contribute to the requirement of a larger sample size. Firstly, the determination of smaller values for sensitivity and specificity will usually command a larger sample size requirement. Secondly, the prevalence of a disease or outcome of interest will dictate the sample size requirement in that a lower prevalence will necessitate a larger sample size requirement for a determination of its sensitivity, whereas a higher prevalence will demand a larger sample size for a determination of its specificity. Thirdly, a narrower desired half-width of the confidence interval (which is equivalent to a smaller marginal error) will also command a bigger sample size requirement. Hence, an ‘ideal’ sample size will not be available for which it can be universally applied because the determination of an ‘ideal’ sample size shall ultimately depend on the conditions and prerequisites for the setting up of the target effect size (Table 2, Table 3 and Table 4).
For a diagnostic purpose, the researcher will usually aim to have an excellent level of both sensitivity and specificity. Therefore, these sample size calculations are now presented in Table 2, which provides a pair of same values for both sensitivity and specificity. Based on the initial setting of the requirements for its target sensitivity and specificity, the minimum sample size requirement can range from 58 to 26,580 subjects. The ideal goal for a researcher is to achieve an excellent level of accuracy (i.e., to aim for both sensitivity and specificity of at least 0.95) and, hence, only a smaller sample size will usually be required for recruitment. However, by considering the highly probable risk of not being able to reach an excellent level of both sensitivity and specificity, a researcher will be encouraged to recruit more subjects to ensure that he/she is able to confidently conclude that the reported level of accuracy of a particular diagnostic condition is at least satisfactory (i.e., with a degree of sensitivity and specificity of at least 0.80).
For a screening purpose, the researcher will usually aim to have achieved an excellent level of either sensitivity or specificity, but not both. To facilitate the setting up of all conditions and prerequisites for conducting the proper sample size planning of a screening condition, the tabulation of all these sample size calculations is now presented in Table 3 and Table 4. Most studies that emphasize a screening purpose will aim for a higher degree of sensitivity and, thus, they may need to sacrifice their specificity levels. A list of various pre-specified values for their sensitivity are now provided with its minimum value of 0.70% along with a fixed pre-specified specificity value of 50.0% (Table 3). The ideal goal for a researcher is to set to achieve an excellent degree of sensitivity such as 0.95. Based on the tabulated values displayed by Table 3, this means that if a researcher decides to set the desired width of a 95% confidence interval to be 0.20, then its minimum sample size requirement shall range between 174 and 1041 depending on the prevalence of the disease or outcome of interest.

4. Discussion

Scholars have developed numerous techniques to estimate or calculate a minimum sample size requirement for diagnostic research. There is no single technique that is superior to others because it totally depends on the study’s purpose and the researchers’ expectations. The sample size issues regarding diagnostic research were first discussed by Linnet in 1987 and, after that, the discussion regarding sample size is still continuing to be discussed with dozens of articles being published to discuss this matter even further [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]. The summary of these findings was described in Table 1.
Previous studies had found some spurious findings that were being derived from research pertaining to sample size planning for diagnostic research, such as the requirement of a very small minimum sample size [39,40]. In order to avoid the possibility of misconstruing the statistical rigor inherently present in this type of analysis, this paper adopts a different approach by offering sample size tables for performing sensitivity and specificity analysis that are based on the use of desired width of 95%CI as a measure for clinical or scientific significance, and are hence emphasizing the importance of this desired interval width as the level of confidence in all types of diagnostic research [35].
By imposing a tighter limit on the desired width of 95%CI (i.e., 0.1 or 0.2), the researcher will be more confident in ensuring that the accuracy of the study can realistically be scientifically justifiable. It is a well-known fact that a statistically significant result (i.e., p < 0.05) can be erroneously caused by an extremely large sample size [41]. Thus, some scholars may argue over the utility of the p-value, but it is nevertheless still applicable and acceptable until now [42,43,44,45]. Therefore, by imposing an additional condition such as the placement of a relevant fixed limit for the desired width of 95%CI; it is less likely for the researcher to be misguided and hence they will be better able to ensure that an acceptable level of accuracy can be realistically achieved.

4.1. How Should the Sample Size Tables Be Used?

The sample size tables are provided in this paper to facilitate a researcher for the purpose of sample size planning of all studies related to sensitivity and specificity analysis. Firstly, a researcher needs to determine the prevalence of a disease or the outcome of interest (such as ‘poor outcome’ or ‘good outcome’). The prevalence of a disease per se can vary widely depending on which type of study population a researcher aims to study. In other words, the researcher shall have to decide the specific type of study population for which a diagnostic or screening condition is intended. For example, the prevalence of colon cancer among a ‘high-risk’ population is obviously very much higher than that among a healthy population. If the researcher aims to implement the diagnostic or screening test or marker among the ‘high-risk’ population for colon cancer, then an estimate of the prevalence of colon cancer in the study population should be calculated from the ‘high-risk’ population for colon cancer (i.e., patients from a hospital setting such as the surgical specialist clinic).
Secondly, the researcher needs to decide whether the test or marker is intended to be an alternative for a diagnostic tool/marker or will solely be used for screening purposes. As a researcher or research scientist, they should be able to decide the desired aim of a particular test or marker since they are also the subject matter in the specialized field and should therefore know the true capabilities and expectations of a diagnostic test or marker. Hence, Table 2 should be referred to if a researcher intends to develop a diagnostic test or marker, whereas Table 3 and Table 4 should be referred to if they intend to develop a screening test or marker.
Thirdly, the researcher also needs to decide beforehand the target values of both the sensitivity and specificity of a test or marker. If they intend to develop a diagnostic test or marker, then there will be four different possible sets of sensitivity and specificity values. For the sake of simplicity, this paper thereby recommends that both the sensitivity and specificity are being measured by a score of 0.95, 0.90, 0.80, and 0.70 to be regarded as an excellent, nearly excellent, good, and fairly good diagnostic test/marker, respectively. Finally, the researcher will also have to decide beforehand the desired interval width of 95%CI (i.e., either 0.1 or 0.2). The determination of the desired interval width is likely to be driven by the actual intended purpose of the study, the availability of resources, and the capability and experience level of the researcher under the various experimental conditions.
Say, for example, the prevalence of a disease is set at 40.0%, the target desired interval width for 95%CI is set at 0.1 and the desired degrees of sensitivity and specificity are set at 0.90, respectively. Based on the abovementioned conditions, the minimum required sample size to perform an analysis for the determination of sensitivity is 395 and that for the determination of specificity is 264. In this case, the sample size of 395 shall be preferably chosen since it yields a much larger sample than the other. In another scenario, say, for example, the prevalence of a disease is set at 70.0%, the target desired interval width for 95%CI is set at 0.1 and the desired degrees of sensitivity and specificity are both set at 0.90, respectively. Based on such conditions, the minimum required sample size for assessing the degree of sensitivity is 135 and that for assessing the degree of specificity is 1304. Again, in this case, the sample size of 1304 shall be chosen preferably for the same reason mentioned above.

4.2. Issues That Can Arise from Very Large Sample Sizes Involving Very Low Level of Prevalence

It is evident that some of the calculations in the tables have yielded extremely large sample size requirements. For example, Table 2 has shown that a minimum of 26,580 subjects will be needed to claim for the degree of both sensitivity and specificity of 70.0%, which are based on the desired interval width of 95%CI of 0.05 in a study population with a 5.0% prevalence rate of disease. There are two main pertinent issues that await our due consideration here. Firstly, it is necessary to carefully consider whether the purported values of sensitivity and specificity of 70.0% will satisfy both the researchers and stakeholders (who are the end-users of the test or marker) and, secondly, it is also necessary to determine whether or not the researchers can realistically cope with the work involved in the recruitment for a very large number of subjects.
This means to say that the recruitment of a large number of subjects will only be regarded worthwhile if the study can realistically be proven to be very highly sensitive and specific, such as having an exceptionally high degree of both sensitivity and specificity at 95.0%. In other words, it is only recommended to recruit an unusually large number of subjects if there are sufficient grounds for us to believe that a diagnostic test or marker demonstrates a very high level of high accuracy. Such grounds can often be retrieved from the literature or they can be based on cumulative scientific evidence for an evaluation of accuracy of the test marker.
Thus, the most important consideration here is that the core emphasis for diagnostic research shall be to develop a sensitive and specific marker by garnering sufficient cumulative evidence of its sensitivity and/or specificity and not just to merely study a particular diagnostic test/marker for its sensitivity or specificity without having accruing sufficient evidence of its sensitivity and specificity.
In other words, it is not recommended to conduct a study with very large number of subjects merely to prove that a diagnostic test/marker has a degree of both sensitivity and specificity of 70.0%. However, these calculations are being presented in this paper merely to illustrate the point that the recruitment of such a high number of subjects can be justifiable if and only if the accruing evidence has already demonstrated sufficient grounds that a particular diagnostic/screening test or marker has garnered cumulative scientific evidence of a high level of sensitivity and/or specificity, which provides a valid rationale for the study [46,47,48,49,50]. Otherwise, it is not recommended to do so.

4.3. Determination of Sample Size Requirements for Diagnostic Purposes

One previous study provided a list of recommended criteria for creating a sample size statement that should ideally include five elements. These elements shall consist of Step 1: to understand the objective of the study, Step 2: to select the appropriate statistical analysis, Step 3: to calculate or estimate the sample size, Step 4: to provide additional allowances during the subject recruitment procedure to cater for a certain proportion of non-response, and Step 5: to write a standard sample size statement [51]. For the purpose of writing a standard sample size statement, a common scenario has been created as follows: the researchers aim to prove that a particular new marker extracted from a patient’s blood is suitable for use as a diagnostic marker to determine whether the patient has colon cancer.
Thus, the sample size statement is written as follows: “This study aims to determine whether marker X is highly accurate to detect all patients with colon cancer. The basis of its sample size calculation is derived from both sensitivity and specificity analyses. In a population at risk of colon cancer (i.e., patients who have already exhibited and reported to have usual symptoms of colon cancer), the prevalence of colon cancer is 10.0%. For a reliable diagnostic marker, the researcher will typically aim the new marker to have a degree of both sensitivity and specificity of at least 95.0%. The sample size calculation is based on the desired width of the 95% confidence interval for both its sensitivity and specificity to be set at 1.0. Based on the abovementioned conditions, the minimum sample size requirement to perform a study for determining its sensitivity is 940 patients and that for determining its specificity is 105 patients. Therefore, the minimum sample size of 940 patients shall be deemed necessary since it yields a larger sample between the two. In order to provide additional allowances for incorporating a possible non-response rate of 20.0%, the minimum required sample size is then further inflated to 1175 patients.”

4.4. Determination of Sample Size Requirements for Screening Purposes

Yet, another similar scenario can be applied for the following example whereby the researcher is now aiming to prove that a particular new marker extracted from a patient’s blood is suitable for use as a screening marker (i.e. equal or more than 70.0% for its sensitivity) for colon cancer with the fixed degrees of 50.0% for its specificity or vice versa. Thus, the sample size statement is written as follows: “This study aims to determine whether marker Y is highly sensitive to screen a patient for the purpose of detecting colon cancer. The basis of its sample size calculation is derived from both sensitivity and specificity analyses. In a population at risk of colon cancer (i.e., patients who have already exhibited and reported to have usual symptoms of colon cancer), the prevalence of colon cancer is 10.0%. To obtain a reliable screening marker, the researcher will typically aim for the new marker to have a degree of sensitivity of at least 95.0% and that of specificity of at least 50.0%. The sample size calculation is based on the desired width of a 95% confidence interval for both its sensitivity and specificity to be set at 2.0. Based on the abovementioned conditions, the minimum sample size requirement to perform a study for determining its sensitivity is 290 patients and that for determining its specificity is 116 patients. Therefore, the minimum sample size of 290 patients shall be deemed necessary since it yields a larger sample between the two. In order to provide additional allowances for incorporating a possible non-response rate of 20.0%, the minimum required sample size is then further inflated to 363 patients”.

4.5. Conclusions

Researchers often need a quick and simple ‘rule-of-thumb’ or method to estimate or calculate the minimum sample size requirement. This paper provides background information on a diagnostic study, a list of sample size tables for determining the minimum sample sizes required for performing both the sensitivity and specificity analysis together with a clear and concise guideline on how to use the sample size tables for performing such analysis under a wide variety of differing conditions, and, lastly, it wraps up the whole discussion by offering an illustrative example of how a standard sample size statement should be written.
Indeed, this paper provides a recommendation that the researcher shall now have to set a tighter desired width for the 95% confidence interval (i.e., 0.1 or 0.2) for better sample size planning. All in all, this paper will assist the researcher to conduct a proper sample size planning related to diagnostic research and, hence, it facilitates the researcher to reach a simple and quick decision on sample size planning without resorting to the use of many highly complicated statistical techniques for their computations, as well as to a formal in-depth acquisition of the knowledge and technicality of the subject matter.

Funding

This research received no external funding and the APC was funded by Institute for Clinical Research (ICR), Ministry of Health Malaysia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the Director General of the Ministry of Health for his permission to publish this paper. We also would like to thank Yoon Khee Hon for proofreading this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ng, P.; Li, G.; Chui, K.; Chu, W.C.W.; Li, K.; Wong, R.P.O.; Chik, K.W.; Wong, E.; Fok, T.F. Neutrophil CD64 Is a Sensitive Diagnostic Marker for Early-Onset Neonatal Infection. Pediatr. Res. 2004, 56, 796–803. [Google Scholar] [CrossRef] [Green Version]
  2. Yussof, S.J.; Zakaria, M.I.; Mohamed, F.L.; Bujang, M.A.; Lakshmanan, S.; Asaari, A.H. Value of Shock Index in prognosticating the short-term outcome of death for patients presenting with severe sepsis and septic shock in the emergency department. Med. J. Malays. 2012, 67, 406–411. [Google Scholar]
  3. Kobayashi, M.; Nagashio, R.; Jiang, S.X.; Saito, K.; Tsuchiya, B.; Ryuge, S.; Katono, K.; Nakashima, H.; Fukuda, E.; Goshima, N.; et al. Calnexin is a novel sero-diagnostic marker for lung cancer. Lung Cancer. 2015, 90, 342–345. [Google Scholar] [CrossRef] [Green Version]
  4. Toh, T.H.; Lim, B.C.; Bujang, M.A.; Haniff, J.; Wong, S.; Abdullah, M.R. Mandarin parents’ evaluation of developmental status in the detection of delays. Pediatr. Int. 2017, 59, 861–868. [Google Scholar] [CrossRef]
  5. Affizal, S.; Yap, Y.L.; Loh, Y.D.; Chit, H.H.; Mohamad Adam, B. Palliative Prognostic Index as a predictor of mortality among geriatric patients with advanced chronic medical conditions. Med. J. Malays. 2022, 77, 468–473. [Google Scholar]
  6. Bujang, M.A.; Kuan, P.X.; Tiong, X.T.; Saperi, F.E.; Ismail, M.; Mustafa, F.I.; Abd Hamid, A.H. The all-cause mortality and a screening tool to determine high-risk patients among prevalent type 2 diabetes mellitus patients. J. Diabetes Res. 2018, 2018, 4638327. [Google Scholar] [CrossRef]
  7. Bujang, M.A.; Kuan, P.X.; Sapri, F.E.; Liu, W.J.; Musa, R. Risk Factors for 3-Year-Mortality and a Tool to Screen Patient in Dialysis Population. Indian J. Nephrol. 2019, 29, 235–241. [Google Scholar] [CrossRef]
  8. Yunus, A.; Seet, W.; Mohamad Adam, B.; Haniff, J. Validation of the Malay version of Berlin questionaire to identify Malaysian patients for obstructive sleep apnea. Malays. Fam. Physician 2013, 8, 3–9. [Google Scholar]
  9. Premsenthil, M.; Salowi, M.A.; Bujang, M.A.; Kueh, A.; Siew, C.M.; Sumugam, K.; Chan, L.G.; Tan, A.K. Risk factors and prediction models for retinopathy of prematurity. Malays. J. Med. Sci. 2015, 22, 57–63. [Google Scholar]
  10. Md Sani, S.S.; Han, W.H.; Bujang, M.A.; Ding, H.J.; Ng, K.L.; Amir Shariffuddin, M.A. Evaluation of creatine kinase and liver enzymes in identification of severe dengue. BMC Infect. Dis. 2017, 17, 505. [Google Scholar] [CrossRef]
  11. Johnson, K.L.; Speirs, L.; Mitchell, A.; Przybyl, H.; Anderson, D.; Manos, B.; Schaenzer, A.T.; Winchester, K. Validation of a postextubation dysphagia screening tool for patients after prolonged endotracheal intubation. Am. J. Crit. Care 2018, 27, 89–96. [Google Scholar] [CrossRef] [PubMed]
  12. Chadha, V.K.; Anjinappa, S.M.; Rade, K.; Baskaran, D.; Narang, P.; Kolappan, C.; Ahmed, J.; Praseeja, P. Sensitivity and specificity of screening tools and smear microscopy in active tuberculosis case finding. Indian J. Tuberc. 2019, 66, 99–104. [Google Scholar] [CrossRef] [PubMed]
  13. Tan, S.Y.; Hassan, F.; Bujang, M.A.; Ghazali, N.A. Pre-Operative Predictive Factors Influencing Mortality after Tracheostomy (TRACHMORT). Int. Med. J. 2019, 26, 34–38. [Google Scholar]
  14. Acquah, F.K.; Donu, D.; Obboh, E.K.; Bredu, D.; Mawuli, B.; Amponsah, J.A.; Quartey, J.; Amoah, L.E. Diagnostic performance of an ultrasensitive HRP2-based malaria rapid diagnostic test kit used in surveys of afebrile people living in Southern Ghana. Malar. J. 2021, 20, 125. [Google Scholar] [CrossRef]
  15. Gong, Y.; Zhou, H.; Zhang, Y.; Zhu, X.; Wang, X.; Shen, B.; Xian, J.; Ding, Y. Validation of the 7-item Generalized Anxiety Disorder scale (GAD-7) as a screening tool for anxiety among pregnant Chinese women. J. Affect. Disord. 2021, 282, 98–103. [Google Scholar] [CrossRef] [PubMed]
  16. Fernandez-Montero, A.; Argemi, J.; Rodríguez, J.A.; Ariño, A.H.; Moreno-Galarraga, L. Validation of a rapid antigen test as a screening tool for SARS-CoV-2 infection in asymptomatic populations. Sensitivity, specificity and predictive values. eClinicalMedicine 2021, 37, 100954. [Google Scholar] [CrossRef]
  17. Sia, T.L.L.; Devaraja, V.; Bujang, M.A.; Chang, A.K.W.; Lim, H.H.; Ooi, M.H.; Chua, H.H. A pre-admission triaging tool to predict severe COVID-19 cases: ABCD score. Med. J. Malays. 2022, 77, 237–240. [Google Scholar]
  18. Tan, S.M.; Loh, S.F.; Bujnag, M.A.; Haniff, J.; Abd Rahman, F.N.; Ismail, F.; Omar, K.; Mohd Daud, T.I. Validation of the malay version of children’s depression inventory. Int. Med. J. 2013, 20, 188–191. [Google Scholar]
  19. Hong, J.S.; Tian, J. Sensitivity and specificity of the Distress Thermometer in screening for distress in long-term nasopharyngeal cancer survivors. Curr. Oncol. 2013, 20, e570–e576. [Google Scholar] [CrossRef] [Green Version]
  20. Chiu, H.Y.; Chen, P.Y.; Chuang, L.P.; Chen, N.H.; Tu, Y.K.; Hsieh, Y.J.; Wang, Y.C.; Guilleminault, C. Diagnostic accuracy of the Berlin questionnaire, STOPBANG, STOP, and Epworth Sleepiness Scale in detecting obstructive sleep apnea: A bivariate meta-analysis. Sleep Med. Rev. 2017, 36, 57–70. [Google Scholar] [CrossRef]
  21. Linnet, K. Comparison of quantitative diagnostic tests: Type I error, power, and sample size. Stat. Med. 1987, 6, 147–158. [Google Scholar] [CrossRef]
  22. Simel, D.L.; Samsa, G.P.; Matchar, D.B. Likelihood ratios with confidence: Sample size estimation for diagnostic test studies. J. Clin. Epidemiol. 1991, 44, 763–770. [Google Scholar] [CrossRef]
  23. Buderer, N.M. Statistical Methodology: I. Incorporating the prevalence of disease into the sample size calculation for sensitivity and specificity. Acad. Emerg. Med. 1996, 3, 895–900. [Google Scholar] [CrossRef]
  24. Carpenter, T.E.; Gardner, I.A. Simulation modeling to determine herd-level predictive values and sensitivity based on individual-animal test sensitivity and specificity and sample size. Prev. Vet. Med. 1996, 27, 57. [Google Scholar] [CrossRef]
  25. Obuchowski, N.A.; McClish, D.K. Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. Stat. Med. 1997, 16, 1529–1542. [Google Scholar] [CrossRef]
  26. Lui, K.J.; Cumberland, W.G. Sample size determination for equivalence test using rate ratio of sensitivity and specificity in paired sample data. Control. Clin. Trials 2001, 22, 373–389. [Google Scholar] [CrossRef]
  27. Dendukuri, N.; Rahme, E.; Bélisle, P.; Joseph, L. Bayesian sample size determination for prevalence and diagnostic test studies in the absence of a gold standard test. Biometrics. 2004, 60, 388–397. [Google Scholar] [CrossRef]
  28. Li, J.; Fine, J. On sample size for sensitivity and specificity in prospective diagnostic accuracy studies. Stat. Med. 2004, 23, 2537–2550. [Google Scholar] [CrossRef]
  29. Flahault, A.; Cadilhac, M.; Thomas, G. Sample size calculation should be performed for design accuracy in diagnostic test studies. J. Clin. Epidemiol. 2005, 58, 859–862. [Google Scholar] [CrossRef]
  30. Carley, S.; Dosman, S.; Jones, S.R.; Harrison, M. Simple nomograms to calculate sample size in diagnostic studies. Emerg. Med. J. 2005, 22, 180–181. [Google Scholar] [CrossRef] [Green Version]
  31. Moskowitz, C.S.; Pepe, M.S. Comparing the predictive values of diagnostic tests: Sample size and analysis for paired study designs. Clin. Trials 2006, 3, 272–279. [Google Scholar] [CrossRef] [Green Version]
  32. Steinberg, D.M.; Fine, J.; Chappell, R. Sample size for positive and negative predictive value in diagnostic research using case-control designs. Biostatistics 2009, 10, 94–105. [Google Scholar] [CrossRef] [Green Version]
  33. Fosgate, G.T. Practical sample size calculations for surveillance and diagnostic investigations. J. Vet. Diagn. Invest. 2009, 21, 3–14. [Google Scholar] [CrossRef] [Green Version]
  34. Malhotra, R.K.; Indrayan, A. A simple nomogram for sample size for estimating the sensitivity and specificity of medical tests. Indian J. Ophthalmol. 2010, 58, 519–522. [Google Scholar] [CrossRef]
  35. Hajian-Tilaki, K. Sample size estimation in diagnostic test studies of biomedical informatics. J. Biomed. Inform. 2014, 48, 193–204. [Google Scholar] [CrossRef] [Green Version]
  36. Bujang, M.A.; Adnan, T.H. Requirements for minimum sample size for sensitivity and specificity analysis. J. Clin. Diagn. Res. 2016, 10, YE01–YE06. [Google Scholar] [CrossRef]
  37. Negida, A.; Fahim, N.K.; Negida, Y. Sample size calculation guide—Part 4: How to calculate the sample size for a diagnostic test accuracy study based on sensitivity, specificity, and the area under the ROC curve. Adv. J. Emerg. Med. 2019, 3, e33. [Google Scholar]
  38. Clopper, C.; Pearson, S. The use of confidence or fiducial limits illustrated in the case of the Binomial. Biometrika 1934, 26, 404–413. [Google Scholar] [CrossRef]
  39. Deeks, J.J.; Macaskill, P.; Irwig, L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J. Clin. Epidemiol. 2005, 58, 882–893. [Google Scholar] [CrossRef]
  40. Bachmann, L.M.; Puhan, M.A.; Ter Riet, G.; Bossuyt, P.M. Sample sizes of studies on diagnostic accuracy: Literature survey. Brit. Med. J. 2006, 332, 1127–1129. [Google Scholar] [CrossRef] [Green Version]
  41. Bujang, M.A.; Sa’at, N.; Joys, A.R.; Ali, M.M. An audit of the statistics and the comparison with the parameter in the population. AIP Conf. Proc. 2015, 1682, 050019. [Google Scholar]
  42. Cohen, J. A power primer. Psychol. Bull. 1992, 112, 155–159. [Google Scholar] [CrossRef] [PubMed]
  43. Cohen, J. The earth is round (p < 0.05). Am. Psychol. 1994, 47, 997–1003. [Google Scholar]
  44. Chia, K.S. “Significant-itis”—An obsession with the P-value. Scand. J. Work. Environ. Health 1997, 23, 152–154. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Gelman, A. P-values and statistical practice. Epidemiology 2013, 24, 69–72. [Google Scholar] [CrossRef] [Green Version]
  46. Gulshan, V.; Peng, L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros, J.; et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016, 316, 2402–2410. [Google Scholar] [CrossRef]
  47. Beetz, C.; Skrahina, V.; Förster, T.M.; Gaber, H.; Paul, J.J.; Curado, F.; Rolfs, A.; Bauer, P.; Schäfer, S.; Weckesser, V.; et al. Rapid Large-Scale COVID-19 Testing during Shortages. Diagnostics 2020, 10, 464. [Google Scholar] [CrossRef]
  48. Jehi, L.; Ji, X.; Milinovich, A.; Erzurum, S.; Rubin, B.P.; Gordon, S.; Young, J.B.; Kattan, M.W. Individualizing risk prediction for positive COVID-19 testing: Results from 11,672 patients. Chest 2020, 158, 1364–1375. [Google Scholar] [CrossRef]
  49. Hledík, M.; Polechová, J.; Beiglböck, M.; Herdina, A.N.; Strassl, R.; Posch, M. Analysis of the specificity of a COVID-19 antigen test in the Slovak mass testing program. PLoS ONE 2021, 16, e0255267. [Google Scholar] [CrossRef]
  50. Wehbe, R.M.; Sheng, J.; Dutta, S.; Chai, S.; Dravid, A.; Barutcu, S.; Wu, Y.; Cantrell, D.R.; Xiao, N.; Allen, B.D.; et al. DeepCOVID-XR: An Artificial Intelligence Algorithm to Detect COVID-19 on Chest Radiographs Trained and Tested on a Large U.S. Clinical Data Set. Radiology 2021, 299, E167–E176. [Google Scholar] [CrossRef]
  51. Bujang, M.A. A step-by-step process on sample size determination for medical research. Malays. J. Med. Sci. 2021, 28, 15–27. [Google Scholar] [CrossRef]
Figure 1. Four possible scenarios in diagnostic research.
Figure 1. Four possible scenarios in diagnostic research.
Diagnostics 13 01390 g001
Table 1. Summary of existing published literature related to conducting a proper sample size planning for performing sensitivity and specificity analysis.
Table 1. Summary of existing published literature related to conducting a proper sample size planning for performing sensitivity and specificity analysis.
No.AuthorsYearContent
1Linnet [21]1987Comparison of quantitative diagnostic tests
2Simel et al. [22]1991Emphasizes sample size based on likelihood ratios
3Burderer [23]1996Incorporating the prevalence of disease into the sample size calculation
4Carpenter and Gardner [24]1996Determine herd-level predictive values and sensitivity based on test sensitivity, specificity, and sample size
5Obuchowski and McClish [25]1998Sample size calculation involving binomial ROC curve indices
6Lui and Cumberland [26]2001Provide sample size determination for equivalence test using rate ratio of sensitivity and specificity in paired sample data
7Dendukuri et al. [27]2004Sample size calculation using Bayesian technique
8Li and Fine [28] 2004Develop a sample size and power calculations based on the unconditional power properties of the test statistics
9Flahault et al. [29]2005Calculation emphasizes the study design such as requirements for the selection of cases and controls
10Carley et al. [30]2005Provide nomograms for sample size planning
11Moskowitz and Pepe [31]2006Comparative inference about predictive values of diagnostic tests for paired study design
12Steinberg et al. [32]2008Sample size calculation emphasizes the predictive predicted value and negative predicted value
13Fosgate [33]2009Sample size emphasizes surveillance and diagnostic investigations
14Malhotra and Indrayan [34]2010Provide nomograms for sample size planning
15Tilaki [35]2014Sample size calculation based on prevalence and marginal errors
16Bujang and Adnan [36]2016Sample size calculation based on differences in hypothesis testing
17Negida et al. [37]2019Sample size calculation based on sensitivity, specificity, and the area under the ROC curve
Table 2. Recommended sample size requirements for diagnostic research with various specifications of sensitivity, specificity, prevalence, and desired width that are based on 95% confidence interval.
Table 2. Recommended sample size requirements for diagnostic research with various specifications of sensitivity, specificity, prevalence, and desired width that are based on 95% confidence interval.
SensitivitySpecificityPrevalenceWidthnanbPrevalenceWidthnanb
0.950.950.050.056680352
0.900.90 11,860625
0.800.80 20,4401076
0.700.70 26,5801399
0.950.95 0.101880990.50.10188188
0.900.90 3160167 316316
0.800.80 5280278 528528
0.700.70 6820359 682682
0.900.90 0.2088047 0.208888
0.800.80 140074 140140
0.700.70 178094 178178
0.950.950.10.109401050.60.10157235
0.900.90 1580176 264395
0.800.80 2640294 440660
0.700.70 3410379 569853
0.900.90 0.2044049 0.2074110
0.800.80 70078 117175
0.700.70 89099 149223
0.950.950.20.104701180.70.10135314
0.900.90 790198 226527
0.800.80 1320330 378880
0.700.70 1705427 4881137
0.900.90 0.2022055 0.2063147
0.800.80 35088 100234
0.700.70 445112 128297
0.950.950.30.103141350.80.10118471
0.900.90 527226 198791
0.800.80 880378 3301321
0.700.70 1137488 4271706
0.900.90 0.2014763 0.2055221
0.800.80 234100 88351
0.700.70 297128 112446
0.950.950.40.102351570.90.10105941
0.900.90 395264 1761581
0.800.80 660440 2942641
0.700.70 853569 3793411
0.900.90 0.2011074 0.2049441
0.800.80 175117 78701
0.700.70 223149 99891
Note: na refers to sample size for sensitivity; nb refers to sample size for specificity.
Table 3. Recommended sample size requirements involving research for a screening purpose that emphasizes a degree of sensitivity that is based on its 95% confidence interval.
Table 3. Recommended sample size requirements involving research for a screening purpose that emphasizes a degree of sensitivity that is based on its 95% confidence interval.
SensitivitySpecificityPrevalenceWidthnanbPrevalenceWidthnanb
0.950.500.050.0566801657
0.900.50 11,8601657
0.800.50 20,4401657
0.700.50 26,5801657
0.950.50 0.1018804240.50.10188804
0.900.50 3160424 316804
0.800.50 5280424 528804
0.700.50 6820424 682804
0.900.50 0.20880110 0.2088208
0.800.50 1400110 140208
0.700.50 1780110 178208
0.950.500.10.109404470.60.101571005
0.900.50 1580447 2641005
0.800.50 2640447 4401005
0.700.50 3410447 5691005
0.900.50 0.20440116 0.2074260
0.800.50 700116 117260
0.700.50 890116 149260
0.950.500.20.104705030.70.101351304
0.900.50 790503 2261304
0.800.50 1320503 3781304
0.700.50 1705503 4881304
0.900.50 0.20220130 0.2063347
0.800.50 350130 100347
0.700.50 445130 128347
0.950.500.30.103145750.80.101182011
0.900.50 527575 1982011
0.800.50 880575 3302011
0.700.50 1137575 4272011
0.900.50 0.20147149 0.2055521
0.800.50 234149 88521
0.700.50 297149 112521
0.950.500.40.102356700.90.101054021
0.900.50 395670 1764021
0.800.50 660670 2944021
0.700.50 853670 3794021
0.900.50 0.20110174 0.20491041
0.800.50 175174 781041
0.700.50 223174 991041
Note: na refers to sample size for sensitivity; nb refers to sample size for specificity.
Table 4. Recommended sample size requirements involving research for a screening purpose that emphasizes a degree of specificity that is based on its 95% confidence interval.
Table 4. Recommended sample size requirements involving research for a screening purpose that emphasizes a degree of specificity that is based on its 95% confidence interval.
SensitivitySpecificityPrevalenceWidthnanbPrevalenceWidthnanb
0.500.950.050.0531,480352
0.500.90 31,480625
0.500.80 31,4801076
0.500.70 31,4801399
0.500.95 0.108040990.50.10804188
0.500.90 8040167 804316
0.500.80 8040278 804528
0.500.70 8040359 804682
0.500.90 0.20208047 0.2020888
0.500.80 208074 208140
0.500.70 208094 208178
0.500.950.10.1040201050.60.10670235
0.500.90 4020176 670395
0.500.80 4020294 670660
0.500.70 4020379 670853
0.500.90 0.20104049 0.20174110
0.500.80 104078 174175
0.500.70 104099 174223
0.500.950.20.1020101180.70.10575314
0.500.90 2010198 575527
0.500.80 2010330 575880
0.500.70 2010427 5751137
0.500.90 0.2052055 0.20149147
0.500.80 52088 149234
0.500.70 520112 149297
0.500.950.30.1013401350.80.10503471
0.500.90 1340226 503791
0.500.80 1340378 5031321
0.500.70 1340488 5031706
0.500.90 0.2034763 0.20130221
0.500.80 347100 130351
0.500.70 347128 130446
0.500.950.40.1010051570.90.10447941
0.500.90 1005264 4471581
0.500.80 1005440 4472641
0.500.70 1005569 4473411
0.500.90 0.2026074 0.20116441
0.500.80 260117 116701
0.500.70 260149 116891
Note: na refers to sample size for sensitivity; nb refers to sample size for specificity.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bujang, M.A. An Elaboration on Sample Size Planning for Performing a One-Sample Sensitivity and Specificity Analysis by Basing on Calculations on a Specified 95% Confidence Interval Width. Diagnostics 2023, 13, 1390. https://doi.org/10.3390/diagnostics13081390

AMA Style

Bujang MA. An Elaboration on Sample Size Planning for Performing a One-Sample Sensitivity and Specificity Analysis by Basing on Calculations on a Specified 95% Confidence Interval Width. Diagnostics. 2023; 13(8):1390. https://doi.org/10.3390/diagnostics13081390

Chicago/Turabian Style

Bujang, Mohamad Adam. 2023. "An Elaboration on Sample Size Planning for Performing a One-Sample Sensitivity and Specificity Analysis by Basing on Calculations on a Specified 95% Confidence Interval Width" Diagnostics 13, no. 8: 1390. https://doi.org/10.3390/diagnostics13081390

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop