Letter to the Editor: “Multiparametric MRI of the bladder: inter-observer agreement and accuracy with the Vesical Imaging-Reporting and Data System (VI-RADS) at a single reference center”
by Abolghasem Shokri, Siamak Sabour (email@example.com)Multiparametric MRI of the bladder: inter-observer agreement and accuracy with the Vesical Imaging-Reporting and Data System (VI-RADS) at a single reference center
We were interested to read an article that recently published by Barchetti G in the Mar 2019 issue of the Eur Radiol . The authors aimed to evaluate accuracy and inter-observer variability using Vesical Imaging-Reporting and Data System (VI-RADS) for discrimination between non-muscle invasive bladder cancer (NMIBC) and muscle invasive bladder cancer (MIBC). Receiver operating characteristics curves were used to evaluate the performance of mpMRI and the Ƙ statistics were used to estimate inter-reader agreement.
Although this article has provided valuable information, there are some substantial points that can help the clarity of the method and result in an accurate interpretation of the study. There are a few things that authors need to consider. The first: weighted kappa should be used with caution because kappa has its own limitation. The second: k value also depends upon the number of categories [2-7]. We should mention that, when a variable with more than two categories or an ordinal scale is used (with 3 or more ordered categories), then the weighted kappa would be a good choice. However, the prevalence in this study is not known. The third: in this study the bootstrap re-sampling procedure was used to calculate the standard error of the Ƙ estimates. Shouldn’t the bootstrap standard error be the standard deviation of the individual bootstrap estimates? Bootstrapping has enormous potential in statistics education and practice, but there are subtle issues and ways to go wrong . Finally, the physician’s opinion and the patient’s clinical status should not be neglected.
However, the authors did not use any of the commonly used statistical tests used to estimate inter-reader agreement in this paper [9-11].