Skip to main content

Notes on algorithms(1) -- Finding Patterns in a File -- An example on Parallelized Programming

Notes on (Gregory R. Andrews).

The problem is to find a string pattern in a file, and print lines containing the patterns.

The Sequential program of the problem is as below:

string line;

read a line of input from stdin into line;

while(!EOF){

look for pattern in line;

if (pattern is in line)

write line;

read next line of input;

}

The point we can parallelize is inside the while loop, the pattern-finding step and the line-reading step. But here we use the same variable line, and we must eliminate it.

So here we induces a variable line2 to perform the reading operation, and variable line1 to perform the finding operation, another variable buffer is to communicate between line1 and line2. The program is as below:

string buffer;
bool done = false;

co #process 1
string line1;
while (true){
wait for buffer to be full or done to be true;
if (done) break;
line1 = buffer;
signal that buffer is empty;
look for pattern in line1;
if (pattern is in line1)
write line1;
}

//#process 2
string line2;
while(true){
read next line of input into line2;
if (EOF) {done = true; break;}
wait for buffer to be empty;
buffer = line2;
signal that buffer is full;
}
oc;

The program use signal - wait mechanism to ensure the use of buffer. The idea is that while reading the buffer, writing operation can't perform.

Comments

Popular posts from this blog

A simple implementation of DTW(Dynamic Time Warping) in C#/python

DTW(Dynamic Time Warping) is a very useful tools for time series analysis. This is a very simple (but not very efficient) c# implementation of DTW, the source code is available at  https://gist.github.com/1966342  . Use the program as below: double[] x = {9,3,1,5,1,2,0,1,0,2,2,8,1,7,0,6,4,4,5}; double[] y = {1,0,5,5,0,1,0,1,0,3,3,2,8,1,0,6,4,4,5}; SimpleDTW dtw = new SimpleDTW(x,y); dtw.calculateDTW(); The python implementation is available at  https://gist.github.com/3265694  . from python-dtw import Dtw import math dtw = Dtw([1, 2, 3, 4, 6], [1, 2, 3, 5],           distance_func=lambda x, y: math.fabs(x - y)) print dtw.calculate() #calculate the distance print dtw.get_path() #calculate the mapping path

Change the default user when start a docker container

When run(start) a docker container from an image, we can specify the default user by passing -u option in command line(In https://docs.docker.com/engine/reference/run/#user ). For example docker run -i -t -u ubuntu ubuntu:latest /bin/bash We can also use the USER instruction in DOCKERFILE to do the same thing(In https://docs.docker.com/engine/reference/builder/#user), note that the option in command line will override the one in the DOCKERFILE. And there is actually another way to start a container with neither DOCKERFILE nor -u option, just by a command like: docker run -i -t ubuntu:latest /bin/bash # with ubuntu as the default user This happens when your start the container from an image committed by a container with ubuntu as the default user. Or in detail: Run a container from some basic images, create ubuntu user inside it, commit the container to CUSTOM_IMAGE:1 . Run a container from CUSTOM_IMAGE:1 with "-u ubuntu" option, and commit the container to CUSTOM...

Notes on Sequential Pattern Mining (2) -- Partial Order Pattern Mining and Contrast Mining

1. In , the authors induce TEIRESIAS algorithms to mining combinatorial patterns with gap constraints in biological sequences. The patterns TEIRESIAS mined is similiar with the common sequential patterns, but it could contain "." the wild card which is also in the alphbel of the sequences database standing for any other item available, for example pattern "A..B" is a length-4 pattern, with two arbitrary items between the first A and the last B. Patterns "AC.B", "AADB" are all said to be more specific than pattern "A..B". TEIRESIAS mining all the maximal patterns () with a support over a min threshold K. There some key points of TEIRESIAS algorithms: 1)The growth of the patterns The growth of the patterns is accomplished by convolute current pattern by a short length pattern. Pattern A and pattern B are convolutable if the last L(very small) characters of pattern A is the same as the first L characters of pattern B, then ...