Skip to main content

The Two Basic Tasks in Data Mining

We know there are so many different methods and tasks in Data Mining: Classification, Predication, Clustering, Sturctured Data Mining, Association Rule Mining, Frequent Itemset Mining, Social Networks Mining, ... ... To distinguish them would lead a boring case.



But in some phylosophical view, I think there are 2 only basic tasks in Data Mining: the first one is to distinguish different items, mainly including Classification and Clustering, their task is to discover or make use of the differences among the target dataset elements; the second one is to discover the combanations, or the patterns in the target dataset, mainly including Frequent Pattern Mining(Itemset, Sequence, Graph), their task is to find the similarity of different elements.



Most of Data Mining Tasks make use of both basic tasks.

Comments

Popular posts from this blog

A simple implementation of DTW(Dynamic Time Warping) in C#/python

DTW(Dynamic Time Warping) is a very useful tools for time series analysis. This is a very simple (but not very efficient) c# implementation of DTW, the source code is available at  https://gist.github.com/1966342  . Use the program as below: double[] x = {9,3,1,5,1,2,0,1,0,2,2,8,1,7,0,6,4,4,5}; double[] y = {1,0,5,5,0,1,0,1,0,3,3,2,8,1,0,6,4,4,5}; SimpleDTW dtw = new SimpleDTW(x,y); dtw.calculateDTW(); The python implementation is available at  https://gist.github.com/3265694  . from python-dtw import Dtw import math dtw = Dtw([1, 2, 3, 4, 6], [1, 2, 3, 5],           distance_func=lambda x, y: math.fabs(x - y)) print dtw.calculate() #calculate the distance print dtw.get_path() #calculate the mapping path

Install mysql-python with mariadb

mysql-python requires libmysqlclient-dev in ubuntu, but the installation of mariadb will have the lib with unmet dependenccies, so the error of "mysql_config not found" may occurred if you install mysql-python via pip. The case is that mariadb has a compatible package, if you have the ppa setup as in  http://downloads.mariadb.org/ . Just "sudo apt-get install libmariadbclient-dev".

PrefixSpan source code in python

The prefixspan is a key algorithm for mining sequential patterns. I have implemented the algorithm in Python. The algorithm is based on the following paper: Jian Pei, Jiawei Han, Senior Member, Behzad Mortazavi-asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal. Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach. IEEE Transactions on Knowledge and Data Engineering, 2004. or their conference paper You may download the source code at the following addresses: Link1