Please login first
Statistical Methods for Inference from Non-Probability Samples and their Application to a Health Barometer
* 1 , 2 , 1
1  Department of Statistics and Operational Research, University of Granada, Granada, 18071, Spain.
2  Faculty of Science, University of Granada, Granada, 18071, Spain.
Academic Editor: Antonio Di Crescenzo

Abstract:

We are in the information age, and every day it becomes easier to obtain information on any topic and for any population of interest. Online surveys, large-scale datasets (Big Data), or social media are modern data sources that provide large samples without the need for substantial resources and within a short period of time. However, if one wishes to make estimates from this type of dataset, great care must be taken, since most of them are non-probability samples. Non-probability samples lack an exhaustive probabilistic sampling design, making it impossible to ensure the representativeness of the sample, which leads to biased estimates even when the sample is large (Meng, 2018).

In order to obtain more reliable and accurate estimates from this type of data, different statistical techniques for bias reduction have been developed. These methods require auxiliary information, and depending on the available auxiliary information, different types of techniques can be applied. If we know the population totals of our covariates, we can apply the Calibration technique. If we have a reference probability sample (in which the variable of interest is missing), we can apply Propensity Score Adjustment (PSA) to estimate the inclusion probabilities in our original sample, or Statistical Matching (SM) to predict the unknown variable of interest in the auxiliary sample. In this same situation, the Doubly Robust (DR) estimator can be applied, which combines both approaches. These are only some of the best-known techniques in this field. Unfortunately, there is no gold standard in practice, and the results will depend on the specific study in question.

In this work, we will study the problem of estimation based on non-probability samples, explore the different methods for inference from such samples, and examine their effectiveness in a real survey conducted by the Spanish Ministry of Health.

Keywords: non-probability samples; inference; survey sampling; modern data

 
 
Top