Planning and Evaluating Agreement and Reliability Studies

A special issue of Diagnostics (ISSN 2075-4418).

Deadline for manuscript submissions: closed (7 November 2020) | Viewed by 21819

Special Issue Editor


E-Mail Website
Guest Editor
1. Department of Nuclear Medicine, Odense University Hospital, Odense, Denmark
2. Department of Clinical Research, University of Souhern Denmark, Odense, Denmark
Interests: design, analysis, and reporting of diagnostic and prognostic trials in molecular imaging – agreement studies; sequential and adaptive trial designs in diagnostic research; response evaluation with PET
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

For every new diagnostic test, biomarker, or measurement based on an assay, its accuracy and precision needs to be proven before being put into clinical use. In case of diagnostic imaging, this encompasses intra- and interrater repeatability and reproducibility assessments. The Guidelines for Reporting Reliability and Agreement Studies by Kottner and colleagues (https://doi.org/10.1016/j.jclinepi.2010.03.002) provide a broad framework and reporting guidance but did, for instance, also indicate that investigations into sample size determination for reliability were small in number and scarcely in agreement. Apart from studies conducted with the primary focus on agreement and/or reliability estimation itself, such assessments may be a part of larger diagnostic, clinical, or epidemiological trials and serve as a quality control, either before the main study or by using data of the main study. Nevertheless, only thorough study planning will secure an appropriate design for and sufficient assessment of agreement and/or reliability.

The purpose of this Special Issue is to provide a collection of articles that reflect the challenge of appropriately planning and reporting agreement and reliability studies, thereby informing and inspiring fellow colleagues pursuing similar endeavors. Papers providing theoretical considerations, practical applications, and pedagogical advice are equally welcome.

Prof. Dr. Oke Gerke
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Diagnostics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • agreement;
  • method comparison;
  • reliability;
  • repeatability;
  • reproducibility;
  • validity.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

12 pages, 344 KiB  
Article
An Extension of the Bland–Altman Plot for Analyzing the Agreement of More than Two Raters
by Sören Möller, Birgit Debrabant, Ulrich Halekoh, Andreas Kristian Petersen and Oke Gerke
Diagnostics 2021, 11(1), 54; https://doi.org/10.3390/diagnostics11010054 - 01 Jan 2021
Cited by 6 | Viewed by 3422
Abstract
The Bland–Altman plot is the most common method to analyze and visualize agreement between raters or methods of quantitative outcomes in health research. While very useful for studies with two raters, a limitation of the classical Bland–Altman plot is that it is specifically [...] Read more.
The Bland–Altman plot is the most common method to analyze and visualize agreement between raters or methods of quantitative outcomes in health research. While very useful for studies with two raters, a limitation of the classical Bland–Altman plot is that it is specifically used for studies with two raters. We propose an extension of the Bland–Altman plot suitable for more than two raters and derive the approximate limits of agreement with 95% confidence intervals. We validated the suggested limit of agreement by a simulation study. Moreover, we offer suggestions on how to present bias, heterogeneity among raters, as well as the uncertainty of the limits of agreement. The resulting plot could be utilized to investigate and present agreement in studies with more than two raters. Full article
(This article belongs to the Special Issue Planning and Evaluating Agreement and Reliability Studies)
Show Figures

Figure 1

8 pages, 571 KiB  
Article
Higher Interrater Agreement of FDG-PET/CT than Bone Scintigraphy in Diagnosing Bone Recurrent Breast Cancer
by Jorun Holm, Ziba Ahangarani Farahani, Oke Gerke, Christina Baun, Kirsten Falch and Malene Grubbe Hildebrandt
Diagnostics 2020, 10(12), 1021; https://doi.org/10.3390/diagnostics10121021 - 28 Nov 2020
Cited by 1 | Viewed by 1987
Abstract
The purpose was to investigate the interrater agreement of FDG-PET/CT and bone scintigraphy for diagnosing bone recurrence in breast cancer patients. A total of 100 women with suspected recurrence of breast cancer underwent planar whole-body bone scintigraphy with [99mTc]DPD and FDG-PET/CT. Scans were [...] Read more.
The purpose was to investigate the interrater agreement of FDG-PET/CT and bone scintigraphy for diagnosing bone recurrence in breast cancer patients. A total of 100 women with suspected recurrence of breast cancer underwent planar whole-body bone scintigraphy with [99mTc]DPD and FDG-PET/CT. Scans were evaluated independently by experienced nuclear medicine physicians and the results for one modality were blinded to the other. Images were visually interpreted using a 4-point assessment scale (0 = no metastases, 1 = probably no metastases, 2 = probably metastases, 3 = definite metastases). Out of 100 women, 22 (22%) were verified with distant recurrence, 18 of these had bone involvement. The proportions of agreement between readers were 93% (86.3–96.6) for bone recurrence with FDG-PET/CT and 47% (37.5–56.7) for bone recurrence with planar bone scintigraphy. The strengths of agreement between readers for diagnosing bone recurrence was ‘almost perfect’ with FDG-PET/CT and was ‘fair’ with planar bone scintigraphy according to Cohen’s kappa value of 0.82 (0.70–0.95) and 0.28 (0.18–0.39), respectively. Interrater agreement yielded improved reproducibility with FDG-PET/CT versus with bone scintigraphy when diagnosing recurrence with bone metastasis in this patient cohort. Full article
(This article belongs to the Special Issue Planning and Evaluating Agreement and Reliability Studies)
Show Figures

Figure 1

13 pages, 8600 KiB  
Article
Interrater Agreement and Reliability of PERCIST and Visual Assessment When Using 18F-FDG-PET/CT for Response Monitoring of Metastatic Breast Cancer
by Jonas S. Sørensen, Mie H. Vilstrup, Jorun Holm, Marianne Vogsen, Jakob L. Bülow, Lasse Ljungstrøm, Poul-Erik Braad, Oke Gerke and Malene G. Hildebrandt
Diagnostics 2020, 10(12), 1001; https://doi.org/10.3390/diagnostics10121001 - 24 Nov 2020
Cited by 10 | Viewed by 1860
Abstract
Response evaluation at regular intervals is indicated for treatment of metastatic breast cancer (MBC). FDG-PET/CT has the potential to monitor treatment response accurately. Our purpose was to: (a) compare the interrater agreement and reliability of the semi-quantitative PERCIST criteria to qualitative visual assessment [...] Read more.
Response evaluation at regular intervals is indicated for treatment of metastatic breast cancer (MBC). FDG-PET/CT has the potential to monitor treatment response accurately. Our purpose was to: (a) compare the interrater agreement and reliability of the semi-quantitative PERCIST criteria to qualitative visual assessment in response evaluation of MBC and (b) investigate the intrarater agreement when comparing visual assessment of each rater to their respective PERCIST assessment. We performed a retrospective study on FDG-PET/CT in women who received treatment for MBC. Three specialists in nuclear medicine categorized response evaluation by qualitative assessment and standardized one-lesion PERCIST assessment. The scans were categorized into complete metabolic response, partial metabolic response, stable metabolic disease, and progressive metabolic disease. 37 patients with 179 scans were included. Visual assessment categorization yielded moderate agreement with an overall proportion of agreement (PoA) between raters of 0.52 (95% CI 0.44–0.66) and a Fleiss kappa estimate of 0.54 (95% CI 0.46–0.62). PERCIST response categorization yielded substantial agreement with an overall PoA of 0.65 (95% CI 0.57–0.73) and a Fleiss kappa estimate of 0.68 (95% CI 0.60–0.75). The difference in PoA between overall estimates for PERCIST and visual assessment was 0.13 (95% CI 0.06–0.21; p = 0.001), that of kappa was 0.14 (95% CI 0.06–0.21; p < 0.001). The overall intrarater PoA was 0.80 (95% CI 0.75–0.84) with substantial agreement by a Fleiss kappa of 0.74 (95% CI 0.69–0.79). Semi-quantitative PERCIST assessment achieved significantly higher level of overall agreement and reliability compared with qualitative assessment among three raters. The achieved high levels of intrarater agreement indicated no obvious conflicting elements between the two methods. PERCIST assessment may, therefore, give more consistent interpretations between raters when using FDG-PET/CT for response evaluation in MBC. Full article
(This article belongs to the Special Issue Planning and Evaluating Agreement and Reliability Studies)
Show Figures

Figure 1

Review

Jump to: Research

19 pages, 5835 KiB  
Review
How Replicates Can Inform Potential Users of a Measurement Procedure about Measurement Error: Basic Concepts and Methods
by Werner Vach and Oke Gerke
Diagnostics 2021, 11(2), 162; https://doi.org/10.3390/diagnostics11020162 - 22 Jan 2021
Viewed by 1640
Abstract
Measurement procedures are not error-free. Potential users of a measurement procedure need to know the expected magnitude of the measurement error in order to justify its use, in particular in health care settings. Gold standard procedures providing exact measurements for comparisons are often [...] Read more.
Measurement procedures are not error-free. Potential users of a measurement procedure need to know the expected magnitude of the measurement error in order to justify its use, in particular in health care settings. Gold standard procedures providing exact measurements for comparisons are often lacking. Consequently, scientific investigations of the measurement error are often based on using replicates. However, a standardized terminology (and partially also methodology) for such investigations is lacking. In this paper, we explain the basic conceptual approach of such investigations with minimal reference to existing terminology and describe the link to the existing general statistical methodology. This way, some of the key measures used in such investigations can be explained in a simple manner and some light can be shed on existing terminology. We encourage clearly conceptually distinguishing between investigations of the measurement error of a single measurement procedure and the comparison between different measurement procedures or observers. We also identify an unused potential for more advanced statistical analyses in scientific investigations of the measurement error. Full article
(This article belongs to the Special Issue Planning and Evaluating Agreement and Reliability Studies)
Show Figures

Figure 1

17 pages, 1340 KiB  
Review
Reporting Standards for a Bland–Altman Agreement Analysis: A Review of Methodological Reviews
by Oke Gerke
Diagnostics 2020, 10(5), 334; https://doi.org/10.3390/diagnostics10050334 - 22 May 2020
Cited by 94 | Viewed by 12113
Abstract
The Bland–Altman Limits of Agreement is a popular and widespread means of analyzing the agreement of two methods, instruments, or raters in quantitative outcomes. An agreement analysis could be reported as a stand-alone research article but it is more often conducted as a [...] Read more.
The Bland–Altman Limits of Agreement is a popular and widespread means of analyzing the agreement of two methods, instruments, or raters in quantitative outcomes. An agreement analysis could be reported as a stand-alone research article but it is more often conducted as a minor quality assurance project in a subgroup of patients, as a part of a larger diagnostic accuracy study, clinical trial, or epidemiological survey. Consequently, such an analysis is often limited to brief descriptions in the main report. Therefore, in several medical fields, it has been recommended to report specific items related to the Bland–Altman analysis. The present study aimed to identify the most comprehensive and appropriate list of items for such an analysis. Seven proposals were identified from a MEDLINE/PubMed search, three of which were derived by reviewing anesthesia journals. Broad consensus was seen for the a priori establishment of acceptability benchmarks, estimation of repeatability of measurements, description of the data structure, visual assessment of the normality and homogeneity assumption, and plotting and numerically reporting both bias and the Bland–Altman Limits of Agreement, including respective 95% confidence intervals. Abu-Arafeh et al. provided the most comprehensive and prudent list, identifying 13 key items for reporting (Br. J. Anaesth. 2016, 117, 569–575). An exemplification with interrater data from a local study accentuated the straightforwardness of transparent reporting of the Bland–Altman analysis. The 13 key items should be applied by researchers, journal editors, and reviewers in the future, to increase the quality of reporting Bland–Altman agreement analyses. Full article
(This article belongs to the Special Issue Planning and Evaluating Agreement and Reliability Studies)
Show Figures

Figure 1

Back to TopTop