High Agreement But Low Kappa I. The Problems Of Two Paradoxes

is the agreement observed in the data, while the expected match is given in the case of random assignment: the values of the Cohen-Kappa statistic would lead to the idea that the levels of agreement for the primary unit, design and evaluation criteria are quite unsatisfactory. However, a simple «look» with the relative values of observed concordance is enough to emphasize the presence of paradoxes. The most likely explanation for the beginning of the paradox can be given by high values, shown in Table 2, from the «Individually,» «Parallel» and «Continu» levels for the Unit, Design and Primary Stopping Point variables. These values have led to a high probability of classification and, therefore, to paradoxical values influenced by Kappa`s statistics. On the other hand, the AC1 statistics show plausible values corresponding to the observed matching values. Landis JR, King TS, Choi JW, Chinchilli VM, Koch GG: compliance and compliance measures with clinical research applications. Stat Biopharma Res. 2011, 3 (2): doi:10.1198/sbr.2011.10019 Scenarios 1-3 deal with the problem of Paradox 1, which has a much appreciated but weak chord for Kappa. Scenario 1 presents symmetrically balanced marginal amounts, while Scenarios 2 and 3 present symmetrical imbalances.

Figure 2 shows the corresponding match diagrams for the 3 scenarios and provides a visual picture of the lack of balance. The match diagram for Scenario 1 has dark squares of relatively equal size to those of scenarios 2-3, inside rectanquins that are also located near the squares. The darkening indicates that there is a high degree of convergence in all three scenarios. Aickin M: Maximum probability of adequacy assessment in the constant prediction probability model and in relation to Cohen`s Kappa. Biometrics. 1990, 46: 293-302. 10.2307/2531434. In the following sections, Cohen Kappa`s statistics are introduced into their general formulation with more than two categories and more than two evaluators, and the conditions that lead to the paradox are briefly described. THE AC1 statistics are then introduced.

Finally, a sample of work, based on a reproducibility study conducted with the quality evaluators of a clinical trial, is used to show the behavior of the two statistics, both in the present and without the paradox. Chart of agreement for hypothetical data in Table 1 for the evaluation of the agreement between two advisors who divide N units into the same two categories.