Skip to main content

Notes on Sequential Pattern Mining (2) -- Partial Order Pattern Mining and Contrast Mining

1.

In , the authors induce TEIRESIAS algorithms to mining combinatorial patterns with gap constraints in biological sequences. The patterns TEIRESIAS mined is similiar with the common sequential patterns, but it could contain "." the wild card which is also in the alphbel of the sequences database standing for any other item available, for example pattern "A..B" is a length-4 pattern, with two arbitrary items between the first A and the last B. Patterns "AC.B", "AADB" are all said to be more specific than pattern "A..B".

TEIRESIAS mining all the maximal patterns () with a support over a min threshold K. There some key points of TEIRESIAS algorithms:

1)The growth of the patterns

The growth of the patterns is accomplished by convolute current pattern by a short length pattern. Pattern A and pattern B are convolutable if the last L(very small) characters of pattern A is the same as the first L characters of pattern B, then by making the L characters overlaped, we get a longer length pattern candidate. For example, convolute the pattern A.BCA and the pattern BCA.D to the pattern A.BCA.D.

2.

In , the authors propose the problem of mining Distinguishing Patterns which the pattern is frequent in one collection of sequneces(the positive collection), but infrequent in another collection of sequences (the negative collection). The problem is very useful in biological sequences problems. The paper doesn't mine the maximal patterns, but the minimal patterns. There some points of the proposed algorithms:

1)The pruning method

The apriori property doesn't hold for the sequences with gap constraints, but not in every cases, because we want to mine the minimal patterns, the supersequence of a minimal pattern could be a new minimal pattern, we can also use a similar apriori property to pruning the search spaces.

2)Support calculation and gap checking

The algorithm adopts a bitset operation methods to achieve fast support calculation and gap checking.

The algorithm in the paper adopt a 2-stages sechme, first generate a set of candidates of minimal distinguishing subsequence patterns, second, minimize the candidates set, get all the real minimal distinguishing subsequence patterns.

Comments

Popular posts from this blog

A simple implementation of DTW(Dynamic Time Warping) in C#/python

DTW(Dynamic Time Warping) is a very useful tools for time series analysis. This is a very simple (but not very efficient) c# implementation of DTW, the source code is available at  https://gist.github.com/1966342  . Use the program as below: double[] x = {9,3,1,5,1,2,0,1,0,2,2,8,1,7,0,6,4,4,5}; double[] y = {1,0,5,5,0,1,0,1,0,3,3,2,8,1,0,6,4,4,5}; SimpleDTW dtw = new SimpleDTW(x,y); dtw.calculateDTW(); The python implementation is available at  https://gist.github.com/3265694  . from python-dtw import Dtw import math dtw = Dtw([1, 2, 3, 4, 6], [1, 2, 3, 5],           distance_func=lambda x, y: math.fabs(x - y)) print dtw.calculate() #calculate the distance print dtw.get_path() #calculate the mapping path

Install mysql-python with mariadb

mysql-python requires libmysqlclient-dev in ubuntu, but the installation of mariadb will have the lib with unmet dependenccies, so the error of "mysql_config not found" may occurred if you install mysql-python via pip. The case is that mariadb has a compatible package, if you have the ppa setup as in  http://downloads.mariadb.org/ . Just "sudo apt-get install libmariadbclient-dev".

The default CREATE TABLE options for Aria Engine in mariadb

The official document of mariadb does not mention the default CREATE TABLE options for tables using Aria Engine.  The default options are list as below: TRANSACTIONAL,  the default value is TRANSACTIONAL=0, i.e., non-transactional. ROW_FORMAT, the default value is ROW_FORMAT=PAGE, which may suits both transactional and non-transactional tables. PAGE_CHECKSUM,  the default value will follow aria_page_checksum system variable, which has default value ON. For the TRANSACTIONAL option, you may consider create a table as below(and ALTER the TRANSACTIONAL=1): CREATE TABLE `test_table` ( `id` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`) ) ENGINE=Aria; If you change the ROW_FORMAT to DYNAMIC or FIXED, everything just goes fine. But if you have ALTER the table with TRANSACTION=1 and change the ROW_FORMAT to DYNAMIC or FIXED, you may got a warning: SHOW WARNINGS; +-------+------+----------------------------------------------------------+ | Level | Code | Message | +-------+--