Skip to main content

Posts

Showing posts from August, 2007

The trick of counting support in structured data mining

In most of structured data mining problems, there is trick of counting pattern support in different types of problems. Take sequece mining for example, suppose we have a dataset of 2 sequences as below: (AB)(C)(AB)(BC) (BC)(A)(BC) We have 2 ways of counting the support of pattern (A)(C): 1) Count every appearances of (A)(C), and in this case, the support of (A)(C) would be 5(We call it support-all). 2) Count once for all appearances of (A)(C) in one sequence, in this case, the support of (A)(C) would be 2(We call it support-byseq). Now, we have 2 types of supports. When dealing with practical problems, we usually name one of the supports as the threshold for frequent patterns. In fact, the other support do have some properties related with the chosen one. The BIDE algorithm setup an example, it uses support-byseq as threshold, and use the other support to form a pruning schema.