The nonparametric Behrens–Fisher problem in partially complete clustered data


In randomized trials or observational studies involving clustered units, the assumption of independence within clusters is not practical. Existing parametric or semiparametric methods assume specific dependence structures within a cluster. Furthermore, parametric model assumptions may not even be realistic when data are measured in a nonmetric scale as commonly happens, for example, in quality-of-life outcomes. In this paper, nonparametric effect-size measures for clustered data that allow meaningful and interpretable probabilistic comparisons of treatments or intervention programs will be introduced. The dependence among observations within a cluster can be arbitrary. Point estimators along with their asymptotic properties for computing confidence intervals and performing hypothesis test will be discussed. Small sample approximations that retain some of the optimal asymptotic behaviors will be presented. In our setup, some clusters may involve observations coming from both intervention groups (referred to as complete clusters), while others may contain observations from one group only (referred to as incomplete clusters). In deriving the asymptotic theories, we do not impose any relation in the rate of divergence of the numbers of complete and incomplete clusters. Simulations show favorable performance of the methods for arbitrary combinations of complete and incomplete clusters. The developed nonparametric methods are illustrated using data from a randomized trial of indoor wood smoke reduction to improve asthma symptoms and a cluster-randomized trial for smoking cessation.



Document Type





clustered data, empirical distribution, nonparametric effects, rank-based method, two-sample problem

Publication Date


Journal Title

Biometrical Journal