I have BSgenome hg19, a GTF file containing gene coordinates, and a BAM file.
This is what I want to do:
1. Search for all occurrences of my sequence, and save the coordinates in a range objects. Following is the current implementation, but the output found coordinates are based on the first bases on DNA fragments, not the first bases of each chromosome. So I need to fix this
mydnaseqset <- getSeq(hg19, gtfobject) findPattern<- function(pat="CG", mydnaseqset) { spat <- DNAString(pat) sapply(seq_len(length(dnaset)), function(i) { y <- dnaset[[i]] matchPattern(spat, y) } ) }
2. With a ranges of patterns coordinates from step 1, I want to loop through aligned reads and count number of reads that have exact match to the pattern, at each range. This I don't know how to do it in R. I guess it requires some tricks with pileup. But how?
Some help will be much appreciated.
Vang