In this paper, we present ScalParC (Scalable Parallel Classifier), a new parallel formulation of a decision tree based classification process. Like other state-of-the-art decision tree classifiers such as SPRINT, ScalParC is suited for handling large datasets. We show that existing parallel formulation of SPRINT is unscalable, whereas ScalParC is shown to be scalable in both runtime and memory requirements. We present the experimental results of classifying up to 6.4 million records on up to 128 processors of Cray T3D, in order to demonstrate the scalable behavior of ScalParC. A key component of ScalParC is the parallel hash table. The proposed parallel hashing paradigm can be used to parallelize other algorithms that require many concurren...
Today, due to globalization of the world the size of data set is increasing, it is necessary to disc...
Abstract Data-intensive computing has received substantial attention since the arrival of the big da...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1998. Simultaneously published...
In this paper, we present ScalParC (Scalable Parallel Classifier), a new parallel formulation of a d...
Classification is an important data mining problem. Although classification is a wellstudied problem...
One of the important problems in data mining is classification. Recently there has been a lot of int...
Data mining refers to the process of finding hidden patterns inside a large dataset. While improving...
Learning decision trees against very large amounts of data is not practical on single node computer...
Data mining is the process of discovering interesting and useful patterns and relationships in large...
Data mining is the extraction of information and its roles from a vast amount of data. This topic is...
Abstract—Decision tree construction is a well-studied data mining problem. In this paper, we focus o...
Abstract. In the fields of data mining and machine learning the amount of data available for buildin...
When running data-mining algorithms on big data platforms, a parallel, distributed framework, such a...
Classification of very large datasets is a challenging problem in data mining. It is desirable to h...
Univariate decision tree algorithms are widely used in Data Mining because (i) they are easy to lear...
Today, due to globalization of the world the size of data set is increasing, it is necessary to disc...
Abstract Data-intensive computing has received substantial attention since the arrival of the big da...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1998. Simultaneously published...
In this paper, we present ScalParC (Scalable Parallel Classifier), a new parallel formulation of a d...
Classification is an important data mining problem. Although classification is a wellstudied problem...
One of the important problems in data mining is classification. Recently there has been a lot of int...
Data mining refers to the process of finding hidden patterns inside a large dataset. While improving...
Learning decision trees against very large amounts of data is not practical on single node computer...
Data mining is the process of discovering interesting and useful patterns and relationships in large...
Data mining is the extraction of information and its roles from a vast amount of data. This topic is...
Abstract—Decision tree construction is a well-studied data mining problem. In this paper, we focus o...
Abstract. In the fields of data mining and machine learning the amount of data available for buildin...
When running data-mining algorithms on big data platforms, a parallel, distributed framework, such a...
Classification of very large datasets is a challenging problem in data mining. It is desirable to h...
Univariate decision tree algorithms are widely used in Data Mining because (i) they are easy to lear...
Today, due to globalization of the world the size of data set is increasing, it is necessary to disc...
Abstract Data-intensive computing has received substantial attention since the arrival of the big da...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1998. Simultaneously published...