CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B۲ production data

عنوان مقاله: Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B۲ production data
شناسه ملی مقاله: JR_JHES-8-2_002
منتشر شده در در سال 1399
مشخصات نویسندگان مقاله:

Mahdi Roozbeh - Faculty of Mathematics, Statistics & Computer Science, Semnan University, Semnan, Iran
Monireh Maanavi - Faculty of Mathematics, Statistics and Computer Science, Semnan University, Semnan, Iran
Saman Babaie-Kafaki - Faculty of Mathematics, Statistics & Computer Science, Semnan University, Semnan, Iran.

خلاصه مقاله:
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variables. In addition, classical methods are affected by the presence of outliers and collinearity. Methods: Nowadays, many real-world data sets carry structures of high-dimensional problems. To handle this problem, we used the least absolute shrinkage and selection operator (LASSO). Also, due to the flexibility and applicability of the semiparametric model in medical data, it can be used for modeling the genomic data. Motivated by these, here an improved robust approach in a high-dimensional data set was developed for the analysis of gene expression and prediction in the presence of outliers. Results: Among the common problems in regression analysis, there was the problem of outliers. In the regression concept, an outlier is a point that fails to follow the main linear pattern of the data. The ordinary least-squares estimator was found potentially sensitive to the outliers; this fact provided necessary motivations to investigate robust estimations. Generally, the robust regression is among the most popular problems in the statistics community. In the present study, the least trimmed squares (LTS) estimation was applied to overcome the outlier problem. Conclusions: We have proposed an optimization approach for semiparametric models to combat outliers in the data set. Especially, based on a penalization LASSO scheme, we have suggested a nonlinear integer programming problem as the semiparametric model which can be effectively solved by any evolutionary algorithm. We have also studied a real-world application related to the riboflavin production. The results showed that the proposed method was reasonably efficient in contrast to the LTS Method.

کلمات کلیدی:
High-dimensional data set, Ordinary least square method, Outliers, Robust regression

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1837034/