One of the most damaging sources of the nonsampling error component of the total survey error is the measurement error. Causes of measurement errors can be the data collection instrument; the mode and the respondent etc. Editing is a major activity to resolve and treat some of the measurement errors, where collected data are reviewed for detection of inconsistencies and errors. The aims of an editing process should be to give detailed information about the quality; to provide for future improvement of the survey; and to tidy up the data.
Editing is costly and time consuming. Developments of new theories and methods useful for reducing the resources spent on editing of survey data are therefore of interest. Such developments will also be necessary for treating registers and data sets of sizes implied by the Big Data concept. One approach towards a more efficient editing process is selective editing, where only a subset of suspicious data is treated. Here the leading idea is to spend resources only on those observations which have potential effects on the estimates. Selective editing is based on the calculations of “global scores”, expressing a combined measure of importance in estimation and suspicion of measurement error. Based on these global scores, observations are selected for editing.
Selective editing has been developed from practical experiences and lacks a theoretical framework. In particular the effect of selective editing is established on analyses of previously collected data sets, while there is no direct measure of error in the estimates due to measurement errors among unedited observations. One suggested approach to correct for remaining measurement errors is when observations selected for treatment are drawn using a sampling design. Global scores are preferably used for sampling, to give more important and suspicious observations a lager probability of selection.
A different approach for measuring remaining measurement errors after selective editing is considered in this project. Editing a subset of observations yields information on the distribution of measurement errors and its dependence on the score constructs used for selection. One way of making use of this information is to adapt a model based approach and regressing observed measurement errors on calculated scores. Since the scores are available also for the unedited set, the estimated model can predict measurement errors in unedited observations. Summing these predictions yields a measure of the effect of remaining measurement errors after selective editing, and gives valuable information on survey quality.
The adaption of a model based approach implicitly assumes the measurement errors as outcomes of random trials. In this project a theory making such an assumption reasonable in survey data is developed. The theory also provides with insights on the distribution of measurement errors suggesting relevant statistical models.