### Semi-mixture Regression Model for Incomplete Data

#### Loc Nguyen 1* , Anum Shafiq 1

1 Advisory Board, Loc Nguyen’s Academic Network, An Giang, Vietnam

2 Department of Mathematics and Statistics, Preston University Islamabad, Islamabad, Pakistan

### Abstract

The regression expectation maximization (REM) algorithm, which is a variant of expectation maximization (EM) algorithm, uses parallelly a long regression model and many short regression models to solve the problem of incomplete data. Experimental results proved resistance of REM to incomplete data, in which accuracy of REM decreases insignificantly when data sample is made sparse with loss ratios up to 80％. However, the convergence speed of REM can be decreased if there are many independent variables. In this research, we use mixture model to decompose REM into many partial regression models. These partial regression models are then unified in the so-called semi-mixture regression model. Our proposed algorithm is called semi-mixture regression expectation maximization (SREM) algorithm because it is combination of mixture model and REM algorithm, but it does not implement totally the mixture model. In other words, only mixture coefficients in SREM are estimated according to mixture model whereas regression coefficients are estimated by REM. The experimental results show that SREM converges faster than REM does although the accuracy of SREM is not better than the accuracy of REM in fair tests.

### Keywords

Regression Model, Mixture Regression Model, Expectation Maximization Algorithm, Incomplete Data

### References

[1] Montgomery, D. C.; Runger, G. C. Applied Statistics and Probability for Engineers, 5th ed.; John Wiley & Sons: Hoboken, New Jersey, USA, 2010, 792. Available online: https://books.google.com.vn/books?id=_f4KrEcNAfEC. (accessed on 6 September 2016)
[2] Horton, N. J.; Kleinman, K. P. Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician, 2007, 61(1), 79-90, DOI: 10.1198/000313007X172556.
[3] Nguyen, L.; Ho, T.-H. T. Fetal Weight Estimation in Case of Missing Data. Experimental Medicine (EM) - Special Issue “Medicine and Healthy Food”, 2018.
[4] Kokic, P. The EM Algorithm for a Multivariate Regression Model: including its applications to a non-parametric regression model and a multivariate time series model. Qantaris GmbH, Frankfurt, 2002. Available online: https://www.cs.york.ac.uk/euredit/_temp/The％20Euredit％20Software/NAG％20Prototype％20platform/WorkingPaper4.pdf. (accessed on 30th June 2018)
[5] Ghitany, M. E.; Karlis, D.; Al-Mutairi, D. K.; Al-Awadhi, F. An EM Algorithm for Multivariate Mixed Poisson Regression Models and its Application. Applied Mathematical Sciences, 2012, 6(137), 6843-6856. Available online: http://www.m-hikari.com/ams/ams-2012/ams-137-140-2012/ghitanyAMS137-140-2012.pdf (accessed on 3 July 2018).
[6] Anderson, B.; Hardin, M. J. Modified logistic regression using the EM algorithm for reject inference. International Journal of Data Analysis Techniques and Strategies, 2013, 5(4), 359-373. DOI: 10.1504/IJDATS.2013.058582.
[7] Zhang, X.; Deng, J.; Su, R. The EM algorithm for a linear regression model with application to a diabetes data. In Proceedings of the 2016 International Conference on Progress in Informatics and Computing (PIC), Shanghai, China, 2016, DOI: 10.1109/PIC.2016.7949477.
[8] Haitovsky, Y. Missing Data in Regression Analysis. Journal of the Royal Statistical Society: Series B (Methodological), 1968, 30(1), 67-82. Available online: https://www.jstor.org/stable/2984459 (accessed on 3 July 2018).
[9] Robins, J. M.; Rotnitzki, A.; Zhao, L. P. Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data. Journal of the American Statistical Association, 1995, 90(429), 106-121, DOI: 10.2307/2291134.
[10] Lamont, A. E.; Vermunt, J. K.; Lee, V. H. M. Regression mixture models: Does modeling the covariance between independent variables and latent classes improve the results? Multivariate Behavioral Research, 2016, 51(1), 35-52, DOI: 10.1080/00273171.2015.1095063.
[11] Hoshikawa, T. Mixture regression for observational data, with application to functional regression models. arXiv preprint, 30th June 2013. arXiv:1307.0170.
[12] Nguyen, H. D. Finite Mixture Models for Regression Problems. The University of Queensland, Brisbane, 2015, DOI: 10.14264/uql.2015.584.
[13] Sung, H. G. Gaussian Mixture Regression and Classification. Rice University, Houston, 2004. Available online: https://scholarship.rice.edu/handle/1911/18710 (accessed on 4 September 2018).
[14] Tian, Y.; Sigal, L.; Badino, H.; Torre, F. D. l.; Liu, Y. Latent Gaussian Mixture Regression for Human Pose Estimation. In Lecture Notes in Computer Science, vol 6494, Proceedings of The 10th Asian Conference on Computer Vision (ACCV 2010), Queenstown, 2010. DOI: 10.1007/978-3-642-19318-7_53.
[15] Grün, B.; Leisch, F. Finite Mixtures of Generalized Linear Regression Models. University of Munich, Munich, 2007. Available online: https://pdfs.semanticscholar.org/e0d5/6ac54b80a1a4e274f11b1d86840461cc542c.pdf (accessed on 4 September 2018).
[16] Bilmes, J. A. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. University of Washington, Berkeley, 1998. Available online: http://melodi.ee.washington.edu/people/bilmes/mypubs/bilmes1997-em.pdf (accessed on 17 September 2013).
[17] Lindsten, F.; Schön, T. B.; Svensson, A.; Wahlström, N. Probabilistic modeling – linear regression & Gaussian processes. Uppsala University, Uppsala, 2017. Available online: http://www.it.uu.se/edu/course/homepage/sml/literature/probabilistic_modeling_compendium.pdf (accessed on 24 January 2018).
[18] Dempster, A. P.; Laird, N. M.; Rubin, D. B. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 1977, 39(1), 1-38.
[19] Nguyen, L.; Ho, T.-H. T. Early Fetal Weight Estimation with Expectation Maximization Algorithm. Experimental Medicine (EM), 2018, 1(1), 12-30, DOI: 10.31058/j.em.2018.11002.
[20] Ho, T. H. T.; Phan, D. T. Fetal Weight Estimation from 37 Weeks to 42 Weeks by Two-Dimensional Ultrasound Measures. Journal of Practical Medicine, 2011, 12(797), 8-9.
[21] Ho, T. H. T.; Phan, D. T. Fetal Age Estimation by Three-Dimensional Ultrasound Measure of Arm Volume and Other Two-Dimensional Ultrasound Measures. Journal of Practical Medicine, 2011, 12(798), 12-15.
[22] Tüfekci, P.; Kaya, H. Combined Cycle Power Plant Data Set, Irvine, California: Center for Machine Learning and Intelligent Systems, 2014.
[23] Pinette, M. G.; Pan, Y.; Pinette, S. G.; Blackstone, J.; Garrett, J.; Cartin, A. Estimation of Fetal Weight: Mean Value from Multiple Formulas. Journal of Ultrasound in Medicine, 1999, 18(12), 813-817. Available online: https://www.ncbi.nlm.nih.gov/pubmed/10591444 (accessed on 9 October 2016).
[24] Nguyen, L. Matrix Analysis and Calculus. Matrix Analysis and Calculus, 1st ed.; Evans, C., Ed.; Hanoi, Vietnam: Lambert Academic Publishing, 2015, 72. Available online: https://www.shuyuan.sg/store/gb/book/matrix-analysis-and-calculus/isbn/978-3-659-69400-4 (accessed on 3 March 2014).