Better Model Selection for Poverty Targeting through Machine Learning: A Case Study in Thailand

Authors

  • Pisacha Kambuya Division of Macroeconomic and Data Analysis, Fiscal Policy Research Institute Foundation, Thailand

Keywords:

Proxy Means Test, Poverty Targeting, Machine Learning, Variable Selection, LASSO, Random Forest

Abstract

Proxy Means Test (PMT) is the method for targeting the poor who should obtain the benefit from social programs by estimating an income or expenditure with OLS method using the set of variables which are correlated with those welfare measurements because it is difficult to measure directly. However, the variable selection in OLS process would require the stepwise regression which is a time-consuming task when the set of variables is very large. Therefore, this study aims to propose the Least Absolute Shrinkage and Selection Operator (LASSO) and Random Forest (RF) algorithms which are part of Machine Learning field to improve PMT model in terms of variable selection and model performance by focusing on the out-of-sample targeting accuracy of the poor household in Thailand. The data in this study comes from Thailand Social-economic survey (SES) in 2016. The results show that PMTs based on the selected variables from RF can reduce the number of actual poor households that are classified as the non-poor household (an exclusion error) and increase poverty accuracy rate (target poor household as poor accurately) in the national, urban and rural levels, however, the inclusion error is still high. For the performance of PMTs based on the selected variable from Stepwise regression and LASSO, are quite not different. In addition, PMTs with Stepwise regression and LASSO selected variables outperform RF selected variables in terms of reduction in an inclusion error. On the other hand, an exclusion error for PMTs based on RF selected variables is significantly one time less than PMTs using Stepwise regression and LASSO selected variables. Since there is a trade-off between the inclusion and exclusion errors, this study suggests that if the objective of the social welfare program is to help the truly poor, PMTs based on the variable selection of RF is more appropriate.    

References

Ahmed, A. U., & Bouis, H. E. (2002). Weighing what’s practical: proxy means tests for targeting food subsidies in Egypt. Food Policy, 27(5), 519-540. Retrieved from https://www.sciencedirect.com/science/article/abs/pii/S0306919202000647

Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural computation, 9(7), 1545-1588.

a. Retrieved from https://doi.org/10.1162/neco.1997.9.7.1545

Belloni, A., & Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli, 19(2), 521-547.

a. Retrieved from https://projecteuclid.org/euclid.bj/1363192037

Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140. Retrieved from https://doi.org/10.1023/A:1018054314350

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. Retrieved from https://doi.org/10.1023/A:1010933404324

Brown, C., Ravallion, M., & Van De Walle, D. (2016). A poor means test? econometric targeting in Africa. The World Bank. Retrieved from https://hdl.handle.net/10986/25814

Chan-Lau, J. (2017). Lasso Regressions and Forecasting Models in Applied Stress Testing. IMF Working Papers No.17/108. Retrieved from https://dx.doi.org/10.5089 /9781475599022.001

Chanmorchan, P., Pornwalai, T., & Popivanova, C. (n.d.). Thailand's child grant support programme. Retrieved from https://transfer.cpc.unc.edu/wp-content/uploads /2016/04/18-Thailands-Child-Grant-Programme.pdf

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), 1-22. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929880/

Grosh, M., & Baker, J. L. (1995). Proxy means tests for targeting social programs: simulations and speculation (English). Living Standards Measurement Study Working Paper No. 118. The World Bank. Retrieved from https://documents .worldbank.org/curated/en/750401468776352539/Proxy-means-tests-for-targeting-social-programs-simulations-and-speculation

Tibshirani, R., Wainwright, M., & Hastie, T. (2015). Statistical learning with sparsity: the lasso and generalizations. Chapman and Hall/CRC.

Friedman, J., Hastie, T., & Tibshirani, R. (2009). Overview of supervised learning. In The elements of statistical learning (pp. 9-41). Springer, New York, NY. Retrieved from https://doi.org/10.1007/978-0-387-84858-7

Ho, T. K. (1995, August). Random decision forests. Proceedings of 3rd international conference on document analysis and recognition, 1, 278-282. IEEE. Retrieved from https://ieeexplore.ieee.org/document/598994

Houssou, N., Zeller, M., Alcaraz V, G., Schwarze, S., & Johannsen, J. (2007). Proxy Means Tests for Targeting the Poorest Households -- Applications to Uganda. Paper presented at the European Association of Agricultural Economists. Retrieved from https://ageconsearch.umn.edu/record/7946/files/sp07ho01.pdf

James, F. C., & McCulloch, C. E. (1990). Multivariate analysis in ecology and systematics: panacea or Pandora's box?. Annual review of Ecology and Systematics, 21(1), 129-166. Retrieved from https://doi.org/10.1146/annurev.es.21.110190.001021

Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction policy problems. American Economic Review, 105(5), 491-95. Retrieved from https://dx.doi.org/10.1257/aer.p20151023

Knippenberg, E., Jensen, N., & Constas, M. (2017). Resilience, Shocks, and the Dynamics of Food Insecurity Evidence from Malawi. Retrieved from https://pdfs.semanticscholar.org/6fac/f0b44239fd283e98b9645c9c2127e2d46933.pdf

Koenker, R. W., & Bassett, G. (1978). Regression Quantiles. Econometrica, 46(1), 33-50. Retrieved from https://EconPapers.repec.org/RePEc:ecm:emetrp:v:46:y:1978 :i:1:p:33-50

Kshirsagar, V., Wieczorek, J., Ramanathan, S., & Wells, R. (2017). Household poverty classification in data-scarce environments: a machine learning approach. Retrieved from arXiv preprint arXiv:1711.06813.

McBride, L., & Nichols, A. (2015). Improved poverty targeting through machine learning: An application to the USAID Poverty Assessment Tools.

Retrieved from https://www.econthatmatters.com/wp-content/uploads/2015/01 /improvedtargeting_21jan2015.pdf

McBride, L., & Nichols, A. (2016). Retooling poverty targeting using out-of-sample validation and machine learning. The World Bank Economic Review, 32(3), 531-550. Retrieved from https://documents.worldbank.org/curated/en /352211475589592980/Retooling-poverty-targeting-using-out-of-sample-validation-and-machine-learning

Narayan, A., & Yoshida, N. (2005). Proxy Means Tests for Targeting Welfare Benefits in Sri Lanka. Report No. SASPR–7, Washington, DC: World Bank, Retrieved from https://documents.worldbank.org/curated/en/803791468303267323/pdf/332580PAPER0SASPR17.pdf

National Economic and Social Development Board. (n.d.). Report of Poverty and Inequality Circumstance in Thailand 2016. Retrieved from https://transfer.cpc.unc.edu/wp-content/uploads/2016/04/18-Thailands-Child-Grant-Programme.pdf

Nguyen, C., & Lo, D. (2016). Testing Proxy Means Tests in the Field: Evidence from Vietnam. Retrieved from https://mpra.ub.uni-muenchen.de/id/eprint/80002

National Statistical Office. (n.d.). The 2016 Household Social-Economics Survey in Thailand. Retrieved from https://ddi.nso.go.th/index.php/catalog/220

Otok, B. W., & Seftiana, D. (2014). The classification of poor households in Jombang With random forest classification and regression trees (RF-CART) approach as the solution in achieving the 2015 Indonesian MDGs’ targets. International Journal of Science and Research (IJSR) Volume 3. Retrieved from https://www.ijsr.net/archive/v3i8/MDIwMTU1NDA=.pdf

Punyasavatsut, C. (2017). The development of information system for learning opportunities insurance (in Thai). Economic Research and Training Center (ERTC), Faculty of Economics, Thammasat University.

Ravallion, M. (1999). Issues in measuring and modeling poverty. The World Bank. Retrieved from https://documents.worldbank.org/curated/en/965061468739145705/Issues-in-measuring-and-modeling-poverty

Sohnesen, T. P., & Stender, N. (2017). Is Random Forest a Superior Methodology for Predicting Poverty? An Empirical Assessment. Poverty & Public Policy, 9(1), 118-133. Retrieved from https://doi.org/10.1002/pop4.169

Thoplan, R. (2014). Random forests for poverty classification. International Journal of Sciences: Basic and Applied Research (IJSBAR), 17(2), 252-259. Retrieved from https://pdfs.semanticscholar.org/370a/5c135812f4a13438eab6fd379de02f929339.pdf

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. Retrieved from https://www.jstor.org/stable/2346178

Downloads

Published

2020-03-16

How to Cite

Kambuya, P. (2020). Better Model Selection for Poverty Targeting through Machine Learning: A Case Study in Thailand. Thailand and The World Economy, 38(1), 91–116. Retrieved from https://so05.tci-thaijo.org/index.php/TER/article/view/183260