I have BSgenome hg19, a GTF file containing gene coordinates, and a BAM file.
This is what I want to do:
1. Search for all occurrences of my sequence, and save the coordinates in a range objects. Following is the current implementation, but the output found coordinates are based on the first bases on DNA fragments, not the first bases of each chromosome. So I need to fix this
mydnaseqset <- getSeq(hg19, gtfobject)
findPattern<- function(pat="CG", mydnaseqset)
{
spat <- DNAString(pat)
sapply(seq_len(length(dnaset)),
function(i) {
y <- dnaset[[i]]
matchPattern(spat, y)
}
)
}
2. With a ranges of patterns coordinates from step 1, I want to loop through aligned reads and count number of reads that have exact match to the pattern, at each range. This I don't know how to do it in R. I guess it requires some tricks with pileup. But how?
Some help will be much appreciated.
Vang
