This page in English


Robust analysis of ordinal data

Om projektet





Sune Karlsson



Ordered categorical data are pervasive in the social and behavioural sciences. The use of questionnaire and rating scales is multi-disciplinary and is often the only way of measuring attitudes, perception and knowledge. This means that the data, irrespective of type of labelling, only represent an order not a magnitude or inter-categorical distance. Hence, exchanging one ordered set of categorical labels with another ordered set must not change the results of statistical treatments of the data. The statistical methods used for ordinal data must take account of these rank-invariant properties in order to obtain valid and reliable results.

Ordinal data might also arise when a variable that is, or can be, measured on a ratio or interval scale is reported in categorized form. E.g. low, normal or high temperature or income reported in terms of which decile in the income distribution an individual falls in, where the cut points between categories may or may not be known. In these cases the data have stronger properties and methods that are potentially more powerful can be used.

The measurement process itself is thus important. The measurement process, along with the assumptions one is willing to make, determines which statistical methods can be used with the ordinal data.

In situations where the data is truly ordinal in the above sense an analogy is often made with the case of ordinally categorized data. The ordinally measured variable is taken to be a representation of an unobserved, latent, variable which represent the trait (e.g. marketing managers attitude towards advertising on the Internet) of interest. To facilitate a more powerful analysis and modelling of causal relationships the underlying variable is then assumed to have a specific distribution. In some cases these are reasonable assumptions, in some cases they are very difficult to justify.

This project strives to bridge the gap between the two approaches to analysis of ordinal data. The non-parametric approach based on the rank-invariant properties of the data, on one hand, is very appealing in its robustness and can offer powerful insights but is currently limited in the type of questions it can address. Part of the project is thus devoted to generalizing the applicability of rank-based methods to the modelling of causal relationships. On the other hand, the latent variable approach is very powerful and offers convenient methods for modelling causal relationships but at the price of making strong assumptions about the underlying properties of the ordinal data. Part of the project is thus devoted to relaxing these assumptions, allowing more flexible or even arbitrary distributions for the underlying latent variable and in the extreme making them invariant to rank preserving transformations.