This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR), which separates input datasets into subsets and construct local linear regression models of them. The proposed data analysis method is shown to be more efficient and flexible than other regression based methods. This paper also proposes an approximate algorithm to construct MMLR models based on $(\epsilon,\delta)$-estimator, and gives mathematical proofs of the correctness and efficiency of MMLR algorithm, of which the time complexity is linear with respect to the size of input datasets. This paper also empirically implements the method on both synthetic and real-world datasets, the algorithm sho...
Large datasets upon which classical statistical analysis cannot be performed because of the curse of...
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory vari...
This study considers the problem of building a linear prediction model when the number of candidate ...
Master's thesis in Computer scienceWith the advent of the era of big data, machine learning has been...
This paper presents a selective survey of recent developments in statistical inference and multiple ...
International audienceA cluster analysis method on massive multiple linear regression models was pro...
This paper describes a new robust multiple linear regression method, which based on the segmentation...
Model specification and selection are recurring themes in econometric analysis. Both topics become c...
This paper considers the problem of online piecewise linear regression for big data applications. We...
This dissertation develops methodologies for analysis of big data and its related theoretical proper...
In data mining, regression analysis is a computational tool that predicts continuous output variable...
We consider multiple linear regression models under nonnormality. We derive modified maximum likelih...
If there are extraordinarily large data, too large to fit into a single computer or too expensive to...
The existence of massive datasets raises the need for algorithms that make efficient use of resource...
This thesis is focused on the development of computationally efficient procedures for regression mod...
Large datasets upon which classical statistical analysis cannot be performed because of the curse of...
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory vari...
This study considers the problem of building a linear prediction model when the number of candidate ...
Master's thesis in Computer scienceWith the advent of the era of big data, machine learning has been...
This paper presents a selective survey of recent developments in statistical inference and multiple ...
International audienceA cluster analysis method on massive multiple linear regression models was pro...
This paper describes a new robust multiple linear regression method, which based on the segmentation...
Model specification and selection are recurring themes in econometric analysis. Both topics become c...
This paper considers the problem of online piecewise linear regression for big data applications. We...
This dissertation develops methodologies for analysis of big data and its related theoretical proper...
In data mining, regression analysis is a computational tool that predicts continuous output variable...
We consider multiple linear regression models under nonnormality. We derive modified maximum likelih...
If there are extraordinarily large data, too large to fit into a single computer or too expensive to...
The existence of massive datasets raises the need for algorithms that make efficient use of resource...
This thesis is focused on the development of computationally efficient procedures for regression mod...
Large datasets upon which classical statistical analysis cannot be performed because of the curse of...
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory vari...
This study considers the problem of building a linear prediction model when the number of candidate ...