We are now demonstrating the use of these existing synthesis measures to assess the strength of convergence and association between many evaluators in three different medical studies. We used the psych package in R to calculate cohens kappa, and the error package in R to calculate Fleiss` kappa and iCC [30]. For the Nelson approaches, we used the clmm function in the ordinal package in R to get the parameter estimates from the GLMM, and then we used our own R script to calculate the compliance and association measures. The R script to customize an example of the dataset from one of the examples (Example 3, Holmquist) is provided online as complementary material. We show where p0w is now the weighted share of the observed match and pcw the weighted share of the random chord. Typically, square weights (also known as square error weights) or linear weights (also known as “absolute error weights”) are used with lower “credits” assigned to pairs of high-discordance ratings [26]. Square weights and linear weights have the form wrs = 1−r−s)2/(C−1)2 or wrs = 1-|r|/(C−1), C being the total number of categories and r and s the category levels (r,s = 1, …, C). Many authors calculate weighted kappas on average in pairs when more than two reviewers participate in a study. Cohen`s weighted kappa can also be calculated with the psych package in R and the FREQ procedure in SAS. Figure 2b shows the average dimensions of the combination of the 1000 simulated data sets of the Cohen approach, CCI and Nelson`s model, when disease prevalence shifts from the bottom up. Nelson`s association model was also not affected by changes in disease prevalence, while Cohen`s and CCI`s mean-weighted cappa were influenced by the prevalence of the underlying disease and generally overestimated the association, regardless of the level of prevalence compared to Nelson`s approach. [Figure 2b] However, the three measures of association were most similar when prevalence was extreme (highest and lowest prevalence).

We also calculated the coverage probabilities for association measures based on nelson et al models based on the simulation study. The results are presented in Annex III. We observe that in case of extreme prevalence of the disease (low or high), the probabilities of coverage of the method based on the Nelson model were lower than the expected 95%. If the prevalence of the disease is equal in the categories, the probabilities of coverage were almost 95%. Coverage probabilities for Cohen`s Kappas and Fleiss` Kappa could not be calculated, as these are generally considered to be statistics strictly used to assess compliance and association [35]. To give an overview of the performance of Cohens Kappas and Fleiss` Kappa, we present online the distributions of the 1,000 correspondence and association values simulated in the complementary material. A formula for calculating the corresponding Fleiss` Kappa variance is also available [19]. . .

.