Statistical Methods for Handling Observation Clustering in Sports Injury Surveillance
Advances in sports injury-surveillance methods have made it possible to accommodate non–time-loss (NTL) injury reporting; however, the analysis of surveillance data now requires careful consideration of the nuances of NTL injury records. Injury-surveillance mechanisms that record NTL injuries are more likely to contain multiple injury records per athlete. These must be handled appropriately in statistical analyses to make methodologically sound inferences. We simulated datasets of NTL injuries using varying degrees of observation clustering and compared the inferences made using traditional techniques with those made after accounting for clustering in computations of injury proportion ratios. Inappropriate handling of even moderate clustering resulted in flawed inferences in 10% to 12% of our simulations. We observed greater bias in our estimates as the degree of clustering increased. We urge investigators to carefully consider observation clustering and adapt analytical methods to accommodate the evolving sophistication of surveillance.Context
Background
Methods
Results
Conclusions
Sports injury surveillance has been critical in identifying patterns of injury incidence and outcomes among athletes in different sports.1–5 As its popularity has increased, surveillance methods have been adapted and mechanisms have grown in complexity. One notable advance in recent years has been the accommodation of non–time-loss (NTL) injuries, which result in participation-restriction time <24 hours. This contrasts with older surveillance data, which typically examined only injuries resulting in ≥24 hours of participation-restriction time (and noted as time-loss [TL] injuries). The National Collegiate Athletic Association Injury Surveillance Program (NCAA-ISP)6 has collected NTL injuries since 2009–2010; the National Athletic Treatment, Injury and Outcomes Network (NATION)7 has also collected NTL injuries since its inception in 2011–2012. These methods have yielded findings suggesting that a substantial proportion of sports-related injuries in collegiate populations are NTL.4
The analysis of NTL injury data, however, poses a unique statistical challenge. Although injury-surveillance mechanisms that exclusively record TL injuries may contain multiple injury records per athlete, mechanisms that also record NTL injuries are more likely to demonstrate this phenomenon. This is commonly referred to as observation clustering (resulting in violation of the independence assumption) in the statistical literature and requires careful analytical consideration when measuring the burden of injury using effect estimates such as injury proportion ratios (IPRs). This problem, as it relates to IPRs estimated using sports injury data, was introduced by Knowles et al.8,9 However, there has not yet been detailed discussion of it or of strategies to handle it in this context. We present, first, a short review of the statistical challenge posed by observation clustering in regard to NTL injuries and, second, a strategy for handling the challenge and making methodologically sound inferences.
Measuring the Burden of Injury Using Injury Proportion Ratios
In epidemiology, risk and incidence rate ratios or both are commonly used to quantify and compare differential morbidity and mortality between samples.10 Analogously, injury rate ratios (IRRs) and IPRs are commonly used effect estimates in sports injury surveillance. To illustrate the abovementioned analytical challenge posed by NTL injury clustering, we will focus on IPRs in this manuscript:
\(\def\upalpha{\unicode[Times]{x3B1}}\)\(\def\upbeta{\unicode[Times]{x3B2}}\)\(\def\upgamma{\unicode[Times]{x3B3}}\)\(\def\updelta{\unicode[Times]{x3B4}}\)\(\def\upvarepsilon{\unicode[Times]{x3B5}}\)\(\def\upzeta{\unicode[Times]{x3B6}}\)\(\def\upeta{\unicode[Times]{x3B7}}\)\(\def\uptheta{\unicode[Times]{x3B8}}\)\(\def\upiota{\unicode[Times]{x3B9}}\)\(\def\upkappa{\unicode[Times]{x3BA}}\)\(\def\uplambda{\unicode[Times]{x3BB}}\)\(\def\upmu{\unicode[Times]{x3BC}}\)\(\def\upnu{\unicode[Times]{x3BD}}\)\(\def\upxi{\unicode[Times]{x3BE}}\)\(\def\upomicron{\unicode[Times]{x3BF}}\)\(\def\uppi{\unicode[Times]{x3C0}}\)\(\def\uprho{\unicode[Times]{x3C1}}\)\(\def\upsigma{\unicode[Times]{x3C3}}\)\(\def\uptau{\unicode[Times]{x3C4}}\)\(\def\upupsilon{\unicode[Times]{x3C5}}\)\(\def\upphi{\unicode[Times]{x3C6}}\)\(\def\upchi{\unicode[Times]{x3C7}}\)\(\def\uppsy{\unicode[Times]{x3C8}}\)\(\def\upomega{\unicode[Times]{x3C9}}\)\(\def\bialpha{\boldsymbol{\alpha}}\)\(\def\bibeta{\boldsymbol{\beta}}\)\(\def\bigamma{\boldsymbol{\gamma}}\)\(\def\bidelta{\boldsymbol{\delta}}\)\(\def\bivarepsilon{\boldsymbol{\varepsilon}}\)\(\def\bizeta{\boldsymbol{\zeta}}\)\(\def\bieta{\boldsymbol{\eta}}\)\(\def\bitheta{\boldsymbol{\theta}}\)\(\def\biiota{\boldsymbol{\iota}}\)\(\def\bikappa{\boldsymbol{\kappa}}\)\(\def\bilambda{\boldsymbol{\lambda}}\)\(\def\bimu{\boldsymbol{\mu}}\)\(\def\binu{\boldsymbol{\nu}}\)\(\def\bixi{\boldsymbol{\xi}}\)\(\def\biomicron{\boldsymbol{\micron}}\)\(\def\bipi{\boldsymbol{\pi}}\)\(\def\birho{\boldsymbol{\rho}}\)\(\def\bisigma{\boldsymbol{\sigma}}\)\(\def\bitau{\boldsymbol{\tau}}\)\(\def\biupsilon{\boldsymbol{\upsilon}}\)\(\def\biphi{\boldsymbol{\phi}}\)\(\def\bichi{\boldsymbol{\chi}}\)\(\def\bipsy{\boldsymbol{\psy}}\)\(\def\biomega{\boldsymbol{\omega}}\)\(\def\bupalpha{\bf{\alpha}}\)\(\def\bupbeta{\bf{\beta}}\)\(\def\bupgamma{\bf{\gamma}}\)\(\def\bupdelta{\bf{\delta}}\)\(\def\bupvarepsilon{\bf{\varepsilon}}\)\(\def\bupzeta{\bf{\zeta}}\)\(\def\bupeta{\bf{\eta}}\)\(\def\buptheta{\bf{\theta}}\)\(\def\bupiota{\bf{\iota}}\)\(\def\bupkappa{\bf{\kappa}}\)\(\def\buplambda{\bf{\lambda}}\)\(\def\bupmu{\bf{\mu}}\)\(\def\bupnu{\bf{\nu}}\)\(\def\bupxi{\bf{\xi}}\)\(\def\bupomicron{\bf{\micron}}\)\(\def\buppi{\bf{\pi}}\)\(\def\buprho{\bf{\rho}}\)\(\def\bupsigma{\bf{\sigma}}\)\(\def\buptau{\bf{\tau}}\)\(\def\bupupsilon{\bf{\upsilon}}\)\(\def\bupphi{\bf{\phi}}\)\(\def\bupchi{\bf{\chi}}\)\(\def\buppsy{\bf{\psy}}\)\(\def\bupomega{\bf{\omega}}\)\(\def\bGamma{\bf{\Gamma}}\)\(\def\bDelta{\bf{\Delta}}\)\(\def\bTheta{\bf{\Theta}}\)\(\def\bLambda{\bf{\Lambda}}\)\(\def\bXi{\bf{\Xi}}\)\(\def\bPi{\bf{\Pi}}\)\(\def\bSigma{\bf{\Sigma}}\)\(\def\bPhi{\bf{\Phi}}\)\(\def\bPsi{\bf{\Psi}}\)\(\def\bOmega{\bf{\Omega}}\)\begin{equation}{\rm IPR} = {\left({\rm{\# \ of \ Specific \ Injuries \ in \ Group \ X}} \over \Sigma \ \rm{All \ Injuries \ in \ Group \ X} \right) \over \left({\rm{\# \ of \ Specific \ Injuries \ in \ Group \ Y}} \over \Sigma \ \rm{All \ Injuries \ in \ Group \ Y} \right) } \end{equation}and is constructed as an estimation of differential risk based solely on a sample of injured individuals. The motivation for using the IPR in this context and the method for estimating it has been discussed previously.8 Briefly, as seen in the equation, the ratio uses frequencies of injury observations in the effect estimation. A ratio >1 would imply that a higher proportion of a given type of injury was observed in group X compared with group Y. Similarly, a ratio <1 would imply that a lower proportion of a given type of injury was observed in group X compared with group Y. A simple application of this measure was demonstrated by Deits et al,11 who used the IPR to illustrate sex differences in proportions of observed head injuries, facial injuries, concussions, etc, among ice hockey players. The proportion of injuries to the head was higher in female players than in male players (IPR = 2.22; 95% confidence interval [CI] = 1.78, 2.77).11
We note that it is not the effect estimate that is influenced by clustering, as the formula is based on a count of injuries. Instead, as we will discuss, it is the estimation of the standard error (SE) of the IPR and any inference made using it that is directly biased when clustering is present.
Clustering
When a set of injury observations is analyzed to estimate IPRs, it is inherently assumed that the observations are independent. In other words, standard analytical procedures are accompanied by an unstated understanding that they are applied to a set of distinct cases. This understanding is associated with a more critical assumption that the errors attached to any effect estimate are unrelated,10 which directly relates to the process of computing the SEs associated with effect estimates. When multiple injury observations are linked to the same athlete, this assumption is clearly violated.
It has been suggested9 that traditional estimates of the SE are still appropriate in this context, when clustering is not pervasive within a given set of records.9 However, the inclusion of NTL injury reporting in surveillance increases the likelihood of observation clustering. It may be more common to observe >1 injury per athlete when records include both TL and NTL injuries rather than TL injuries only. Nonetheless, in circumstances of observation clustering, the fundamental condition associated with standard analytical procedures—that the errors are unrelated—no longer holds true. In these cases, the SE accompanying any effect estimate such as an IPR is biased.10 As SEs attached to an estimate are routinely used to construct 95% CIs around the estimate and these CIs often are the basis for inferences regarding statistical significance (whether or not the null value of 1.00 is contained in the interval), failing to acknowledge the violation of the independence assumption can lead to flawed inferences. However, statistical techniques are available that can be used to address this problem.
Sandwich Estimator
A robust method for handling the clustering phenomenon is employment of the sandwich covariance estimator.10,12,13 This technique is typically used to handle misspecification of model covariance, which is a consequence of the independence assumption being violated.10,12,13 Compared with standard methods for computing model covariance, the sandwich estimator uses an adapted method that incorporates empirical data10,13 and consequently adjusts the SE estimates. Practically, this technique is employed in statistical analyses that use a unique identifier to denote a given participant, which is then repeated each time an observation (an injury record in this context) in the dataset corresponds to the same participant. Ultimately, the data-collection and -management protocols would only need to ensure that the included athletes could be distinguished. Such identification could be done without compromising the deidentified nature of the data (eg, randomly generated alphanumeric expressions). However, for a number of surveillance systems that focus on data analysis at the aggregate level and not the individual level, such identification may not be feasible. At a minimum, such studies may benefit from disclosure of this limitation, noting the data were analyzed under the assumption of independence.
Simulations
To illustrate the effect of the sandwich estimator, we simulated separate datasets of NTL injuries (each containing 1000 NTL injury records) with various degrees of observation clustering. That is, we created datasets with the following amounts of clustering: no clustering (0%), low clustering (∼25%), moderate clustering (∼50%), considerable clustering (∼75%), and high clustering (>95%). Thus, the proportion of observation clustering was used as a direct indication of the number of unique athletes who contributed injury records to each dataset. In each scenario, we then computed IPRs (using log-binomial regressions) to estimate sex differences in injury proportions across levels of event type (games and practices) and injury mechanisms (player-to-player contact, player contact with surface or equipment, and noncontact or overuse). In these IPRs, female participants served as the referent group. This approach is used extensively in descriptive epidemiology of sport-related injuries.4,8,14–17
For the estimated IPRs, we computed SEs and CIs using 2 methods. We first computed the SEs and CIs assuming that all records were independent and then repeated the computation while accounting for observation clustering using the sandwich estimator to produce the sandwich SEs (SSEs) and corresponding CIs. This simulation process was repeated 1000 times for each clustering scenario. Details regarding our simulation procedures and the code used for analysis may be found in the Appendices.
Results
A summary of results from our simulations is presented in the Table. The averages presented in this table were computed from 1000 simulated datasets under each of the clustering conditions described earlier. As previously mentioned, the estimates of the IPRs themselves should not change drastically, as evidenced in the Table. However, we draw attention to the average estimated SE and SSE. Minimal changes in the SE occurred with different levels of clustering. Yet as the levels of clustering increased, so did the SSE. The differences between the SSE and SE increased as well. For example, while examining sex differences in the proportions of game-related injuries, we saw that the average SE (not appropriately accounting for clustering) ranged between 0.063 and 0.064, whereas the average SSE (appropriately accounting for clustering) ranged between 0.063 and 0.102. Moreover, we observed greater differences, on average, between the 2 SE estimations as the degree of clustering increased (Figures 1 and 2).




Citation: Journal of Athletic Training 54, 11; 10.4085/1062-6050-438-18



Citation: Journal of Athletic Training 54, 11; 10.4085/1062-6050-438-18
To highlight the effect of the differences between SEs on inferences related to statistical significance, we present an agreement proportion in the Table. This is a measure of the number of times (out of 1000) that an agreement was noted between the SE-based and SSE-based decisions with respect to the null hypothesis that the proportions for males and females would not be different. For example, while comparing sex differences in game-related injuries from 1 simulated dataset with high clustering (>95%), the SE-based 95% CI (0.75, 0.97) was constructed around an IPR of 0.85, while the SSE-based 95% CI around the same IPR was constructed as (0.69, 1.05). In an applied context, using the sandwich estimator results in failure to reject the null hypothesis, whereas the standard approach results in sufficient evidence to reject the null. Consequently, inferential disagreement was observed in this simulated dataset. As the degree of clustering increased, so did the proportion of disagreement in our simulations (Table).
Summary
Injury proportion ratios are a powerful method for comparing differential injury prevalence between groups.8 However, the methods relied up on to draw inferences related to an estimated IPR are predicated on the aforementioned assumption of independence. As presented here, the failure to handle violations of this assumption can lead to biased estimates of the SE and consequently flawed inferences. Although observation clustering and resultant violations of the independence assumption are difficult to avoid in injury surveillance, we present here a technique for handling such violations. The sandwich covariance estimator has been shown to be robust to similar misspecifications of model covariance.10,12,13 We discuss here the specific application of this technique to injury surveillance and illustrated its value in this context. Although our discussions have been primarily limited to NTL injuries, we acknowledge the possibility that any combination of NTL and TL injuries may be clustered within a given set of surveillance data. As such, we note that investigators may apply the sandwich estimator as demonstrated here, even while conducting comparative analyses on consolidated sets of clustered TL and NTL injury observations.
Knowles et al stated9 that the concern about correcting for clustering may be minimal if the average number of injuries per injured athlete is low. However, we urge investigators to use the sandwich estimator even when no clustering is suspected. Our results indicate that the SSE estimations are identical to the standard or traditional SE estimations in cases of no clustering. We note that this is consistent with the statistical theory surrounding the sandwich estimator, and this property has been described in the context of presenting the mathematical derivation of the estimator.10 Moreover, although the differences in SE estimations may seem negligible out of context, it is important to consider the inferential disagreement observed in our simulations. Inappropriate handling of even moderate clustering resulted in flawed inferences in 10% to 12% of our simulations. The degree of bias in the estimated SEs and the resultant flawed inferences depend on the extent of clustering as well as the nature of clustering. That is, not only the proportion of clustered observations (as a fraction of all observations) within a dataset but also the number of observations contributed by participants with multiple observations directly affect the observed results. Thus, we encourage investigators to carefully consider observation clustering in order to protect against inflated likelihoods of type I error. This will better inform clinicians using the literature as a foundation for evidence-based practice. Ultimately, the application of this technique may depend on the availability of requisite data (ie, unique identifiers for participants). However, as false rejections of the null hypothesis are generally considered to be egregious errors in observational and experimental science, it is advisable to use methods to minimize such errors whenever possible.

Differences between traditionally estimated standard errors and sandwich standard errors over varying degrees of observational clustering for event type.

Differences between traditionally estimated standard errors and sandwich standard errors over varying degrees of observational clustering for injury mechanism.
Contributor Notes
Avinash Chandran, PhD, MS, and Derek Brown, PhD, MS, contributed equally to manuscript preparation.