Black represents outcomes coded as 1 in the binomial trial, gray arrows represent outcomes coded as a 0. The tables show the model estimates, including node-specific individual random effects and their correlations, as well as treatment eyespot effects for all 4 nodes. Note that for clarity, we only include Z tests and P values for the effects of interest, and not the intercepts for expanded results, see Supplementary Appendix S1.

Traditionally, the repeatability of pairs of ordinal measurements is analyzed by calculating the Spearman rank correlation between the first and second measures of individuals on a given day Martin and Bateson The limitation of this approach is that, because it uses ranks, it lacks quantitative interpretation unless one assumes the different levels of the ordered category to be equidistant e.

Moreover, it is not comparable with standard Anova-based repeatability measures e. On the other hand, analogous measures can be calculated for non-Gaussian distributions if we can use a GLMM specification Nakagawa and Schielzeth Because it is not known whether the defined behaviors represent a continuum on a linear scale, nodes can be assigned different baseline probabilities by including node identity as a fixed factor in the model. Such an approach allows for different intercepts for each node i. To test and account for the fact that individuals may desensitize with repeated stimulation, we include day of trial as a fixed covariate.

Incorporating a nodeâ€”day interaction could show whether the day effect is different as individuals go up the display intensity scale. However, for simplicity of illustration, we assume that day of observation has a similar effect on the escalation probabilities at all levels. Finally, because observations were performed twice a day and on different individuals, we include random effects for individual and replicate within each day for full details, see Supplementary Appendix S1. Including individual as a random effect quantifies the variance of the behavior in this case, sequential probability of escalation among individuals see the table in Figure 1a.

Repeatability can then be estimated as the proportion of variance explained by the individual Nakagawa and Schielzeth Our second example, a subset of data from a larger experiment, aims at investigating the role of butterfly wing eyespots in deterring potential predators. In this experiment, individual great tits P.

Each bird was tested twice on a single treatment. However, this reveals us little about which behaviors were different and how, nor allows us to account for repeated measures of individuals. To be analyzed as an IRTree GLMM, these 5 behavioral categories can be conceptualized as a nonordered sequential decision tree Figure 1b determined by 4 binomial nodes: 1 the probability of showing a response node 1 , 2 the probability of responding aversively behaviors 4 and 5 , rather than to show interest behaviors 2 and 3, node 2 , 3 whether to explore conditional on having shown interest node 3 , and 4 whether to flee given that the response was aversive node 4.

## Dr. Paul De Boeck

In this case, we are interested in the effect of the treatment butterfly eyespots , on each of the nodes separately, because they refer to qualitatively different processes. Thus, we include treatment and node as interacting fixed factors. For clarity, we set separate intercepts for each node. As individual birds were tested more than once, we need to include a random effect for individual, which can be node specific, because it is possible that individual variation is expressed differently at all nodes for full details, see Supplementary Appendix S1.

The results are shown in the Figure 1b and reveal that the presence of eyespots on a butterfly wing only has an effect on node 2 interest vs. In particular, the presence of eyespots increases the probability of showing an aversive response. The strength of the aversive response node 4: flee vs. We can also assess from the random effect correlation structure node by individual the individual correlations.

- Radioisotopes in the Human Body. Physical and Biological Aspects;
- Performance in the Blockades of Neoliberalism: Thinking the Political Anew;
- Recycling Humanity Script?

For example, the strong positive correlation 0. In other words, some individuals tend to be more active than others, regardless of the type of behavior that the object elicits. Behavioral data can often be conceptualized as a decision tree leading to alternative categorical outcomes. We believe this applies to a large range of phenomena that behavioral ecologists are interested in, including mate choice, social interactions, or antipredatory responses that are not easily analyzed using traditional approaches.

## Explanatory Item Response Models : Paul De Boeck :

In Table 1 , we propose a list of hypothetical examples that could be analyzed using IRTrees. We have shown how, by conceptualizing the behavioral responses as decision tree, we can analyze such data using a GLMM framework. This provides a variety of advantages. First, by requiring to organize the recorded responses into biologically meaningful decision structures, it stimulates the researcher to decompose behaviors into their constituent components. Second, it allows for simultaneously testing those parts, accounting for the structural dependencies caused by the trees, and therefore avoiding problems associated with performing multiple tests.

Third, it allows for complex experimental designs, such as repeated measures, or hierarchical sampling designs. We have shown examples where individual random effects can affect all the nodes equally katydid probability of deimatic display or separately alternative responses to butterfly eyespots. The GLMM framework allows for the incorporation of more complex random effect structures than the ones shown here, such as genetic relatedness i. This flexibility permits the measurement of important quantities such as individual repeatability Nakagawa and Schielzeth , heritability Wilson et al.

Finally, the parametric nature of the analyses allows for easy estimation of effect sizes. Naturally, an increase in model complexity comes at the cost of requiring higher sample sizes. This is particularly important to consider in behavioral studies, where sample sizes tend to be considerably lower than poll-based studies for which IRT was originally designed.

We thus highly recommend the use of simulations tailored to assess the power of any given GLMM specification and study design where sample size is concerned Johnson et al. In the context of IRTs, the clinical psychology literature on patient-reported outcomes, motivated by stronger limitations in sample size than other applications, has a healthy tradition of providing simulation studies to evaluate power and sample size for a variety of models of differing complexity e.

The general message is that, as the models get more complex and the expected effects are smaller, appropriate sample sizes required to find significance grow from dozens as in our examples to a few hundred.

### Getting started

Similar conclusions are reached by studies on classical multinomial GLMMs e. Other important insights to be gained from simulations of GLMMs include 1 the estimation accuracy and bias of the random effects e. Johnson et al. Psychology and sociology have recently seen important developments in methods that handle the difficulties of categorical data Powers and Xie Many challenges in these fields are common to behavioral ecology and ethology and thus provide exciting new avenues for behavioral ecologists see Nettle and Penke ; Carter et al.

Individual response trees are a good example of how this exchange could be highly beneficial. We hope that our article encourages their application to behavioral data and inspires a better communication of statistical advances across disciplines. We would like to thank D. Abondano and H. Nisu for their help gathering the great tit data and D. Noble and S. Gordon for invaluable comments. Oxford University Press is a department of the University of Oxford.

It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide. Sign In or Create an Account. Sign In. Advanced Search. Article Navigation. Close mobile search navigation Article Navigation. Volume Article Contents.

Lay Summary. Address correspondence to A. E-mail: alopez biologie. Oxford Academic. Google Scholar. Sebastiano De Bona. Janne K. Kate D.

Johanna Mappes. Editor-in-Chief: Leigh Simmons.

### Log in to Wiley Online Library

Cite Citation. Permissions Icon Permissions. Abstract Behavioral data are notable for presenting challenges to their statistical analysis, often due to the difficulties in measuring behavior on a quantitative scale. Open in new tab Download slide. Table 1. Does temperature affect the intensity? Is the strategy repeatable within individuals? Open in new tab. The basics of item response theory. Google Preview.

Power and sample size determination for the group comparison of patient-reported outcomes in Rasch family models. Search ADS.

## Explanatory item response models: a generalized linear and nonlinear approach

De Boeck. The estimation of item response models with the lmer function from the lme4 package in R. Generalized linear mixed models: a practical guide for ecology and evolution. De Bona. Predator mimicry, not conspicuousness, explains the efficacy of butterfly eyespots. Power and sample size determination in the Rasch model: evaluation of the robustness of a numerical method to non-normality of the latent trait.

Evolutionary maintenance of sexual dimorphism in head size in the lizard Zootoca vivipara : a test of two hypotheses.

Hadfield JD.