SEARCH WITHIN CONTENT
Citation Information : Statistics in Transition New Series. Volume 21, Issue 4, Pages 103-122, DOI: https://doi.org/10.21307/stattrans-2020-033
License : (CC BY-NC-ND 4.0)
Received Date : 31-January-2020 / Accepted: 30-June-2020 / Published Online: 15-September-2020
In Official Statistics, interest in data integration has grown enormously, but the effect of integration procedures on statistical analysis has not yet been sufficiently developed. Data integration is not an error-free procedure and linkage errors, as false links and missed links can invalidate standard estimates. Recently, increasing attention has been paid to the effect of linkage errors on the statistical analyses and on statistical predictions. Recently, methods to adjust the unit level small area estimators for linkage errors have been proposed when the domains are correctly specified. In this paper we compare the na¨ıve and the adjusted unit level estimators with the area level estimators that are not affected by the linkage errors. The comparison encourages the use of the adjusted unit level estimator.
BELIN, T., RUBIN, D. B., (1995). A method for calibrating false - match rates in record linkage, Journal of the American Statistical Association, 90, pp. 694–707.
BATTESE, G. E., HARTER, R.M., FULLER, W. A., (1988). An Error-Components Model for Prediction of Crop Areas Using Survey and Satellite Data, Journal of the American Statistical Association, 83, pp. 28–36.
BRISCOLINI, D., DI CONSIGLIO, L., LISEO, B., TANCREDI, A., TUOTO, T., (2018). New methods for small area estimation with linkage uncertainty. International Journal of Approximate Reasoning, 94, pp. 30–42.
CHAMBERS, R., (2009). Regression analysis of probability-linked data, Official Statistics Research Series, Vol. 4.
CHIPPERFIELD, J. O., CHAMBERS, R. L., (2015). Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical Data, Journal of Official Statistics, Vol. 31, No. 3.
DI CONSIGLIO, L., TUOTO, T., (2016). Small Area Estimation in the Presence of Linkage Errors. In International Conference on Soft Methods in Probability and Statistics, pp. 165–172. Springer, Cham.
DI CONSIGLIO, L., TUOTO, T., (2018). When adjusting for the bias due to linkage errors: A sensitivity analysis. Statistical Journal of the IAOS, 34(4), pp. 589–597.
FELLEGI, I. P., SUNTER, A. B., (1969). A Theory for Record Linkage. Journal of the American Statistical Association, 64, pp. 1183–1210.
FAY, HERRIOTT, (1979). Estimates of income for small places: an application of James-Stein procedures to census data. Journal of the American Statistical Association 74, pp. 269–277.
HAN, Y., (2018). Statistical Inference Using Data From Multiple Files Combined Through Record Linkage, PhD Dissertation thesis, downloadable at https: //drum.lib.umd.edu/bitstream/handle/1903/21155/HAN umd 0117E 19360.pdf
HAN, Y., LAHIRI, P., (2018). Statistical analysis with linked data. International Statistical Review, 87, S139–S157.
HARVILLE, D. A., (1977). Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems. Journal of American Statistical Association, 72, pp. 320– 338.
HAWALA, S., LAHIRI, P., (2018). Variance Modeling for Domains. Statistics and Applications, 16, pp. 399– 409.
HERZOG, T. N., SCHEUREN F.J., WINKLER, W. E., (2007). Data Quality and Record Linkage Techniques, Springer Science & Business Media.
JARO, M., (1989). Advances in record linkage methodology as applied to matching the 1985 test census of Tampa, Florida. Journal of American Statistical Association, 84, pp. 414–420.
LAHIRI, P., LARSEN, M. D., (2005). Regression Analysis With Linked Data. Journal of the American Statistical Association, 100, pp. 222–230.
MCLEOD, P., HEASMAN, D. and FORBES, I., (2011). Simulated data for the on the job training, http://www.cros-portal.eu/content/job-training.
NETER, J., MAYNES, E. S, RAMANATHAN, R., (1965). The effect of mismatching on the measurement of response errors. Journal of the American Statistical Association, 60, pp. 1005–1027.
RAO, J. N. K., MOLINA, (2015). Small Area Estimation, Second Edition, Wiley, New York.
RELAIS 3.0 User’s Guide, (2015). available at http://www.istat.it/it/strumenti/metodie-strumenti-it/strumenti-di-elaborazione/relais.
SEARLE, S. R., CASELLA, G., MCCULLOCH, C. E., (2006). Variance Components, Wiley, New York.
SAMART, K., (2011). Analysis of probabilistically linked data, PhD thesis, School of Mathematics and Applied Statistics, University of Wollongong.
SAMART, K., CHAMBERS, R., (2010). Fitting Linear Mixed Models Using Linked Data, Centre for Statistical and Survey Methodology, University of Wollongong, Working Paper pp. 18–10.
SAMART, K., CHAMBERS, R., (2014). Linear regression with nested errors using probability-linked data, Australian and New Zealand Journal of Statistics 56.
SCHEUREN, F., WINKLER, W. E., (1993). Regression analysis of data files that are computer matched – Part I. Survey Methodology, Volume 19, pp. 39–58.
SCHEUREN F., WINKLER W. E., (1997). Regression analysis of data files that are computer matched- part II, Survey Methodology, 23, pp. 157–165.
TANCREDI, A., LISEO, B., (2011) A hierachical Bayesian approach to record linkage and population size problems. Annals of Applied Statistics, 5, pp. 1553–1585.
TUOTO, T., (2016). New proposal for linkage error estimation” Statistical Journal of the IAOS, Vol 32, no. 2, pp. 1–8.