PURPOSE: To evaluate the interreader agreement of a three-tier craniocaudal grading system for brown fat activation and investigate the accuracy of the distinction between the three grades.
MATERIALS AND METHODS: After IRB approval, 340 cases were retrospectively selected from patients undergoing (18)FDG-PET/CT between 2007 and 2015 at our institution, with 85 cases in each grade and 85 controls with no active brown fat. Three readers evaluated all cases independently. Furthermore standardized uptake values (SUV) measurements were performed by two readers in a subset of 53 cases. Agreement between the readers was assessed with Cohen's Kappa (k), the concordance correlation coefficient (CCC) and the intraclass correlation coefficient (ICC). Accuracy was assessed with Bland-Altman and receiver operating characteristics (ROC) analysis. A Bonferroni-corrected two-tailed p<0.016 was considered statistically significant.
RESULTS: Agreement for BAT grade was excellent by all three metrics with k=0.83-0.89, CCC=0.83-0.89 and ICC=0.91-0.94. Bland-Altman analysis revealed only slight average over- or underestimation (-0.01-0.14) with the majority of disagreements within one grade. ROC analysis yielded slightly less accurate classification between higher vs. lower grades (Area under the ROC curves 0.78-0.84 vs. 0.88-0.92) but no significant differences between readers. Agreement was also excellent for the maximum SUV and the total brown fat volume (k=0.90 and 0.94, CCC=0.93 and 0.99, ICC=0.96 and 0.99), but Bland-Altman plots revealed a tendency to underestimate activity by one of the readers.
CONCLUSION: Grading the activation of brown fat by assessment of the most caudally activated depots results in excellent interreader agreement, comparable to SUV measurements.