count coverage for exact matches to sequence pattern
1
0
Entering edit mode
Vang Le ▴ 80
@vang-le-6690
Last seen 4.0 years ago
Denmark

I have BSgenome hg19, a GTF file containing gene coordinates, and a BAM file.

This is what I want to do:

1. Search for all occurrences of my sequence, and save the coordinates in a range objects. Following is the current implementation, but the output found coordinates are based on the first bases on DNA fragments, not the first bases of each chromosome. So I need to fix this

mydnaseqset <- getSeq(hg19, gtfobject)

findPattern<- function(pat="CG", mydnaseqset)
{
  spat <- DNAString(pat)
  sapply(seq_len(length(dnaset)),
         function(i) {
           y <- dnaset[[i]]
           matchPattern(spat, y)
         }
         )
}

 

2. With a ranges of patterns coordinates from step 1, I want to loop through aligned reads and count number of reads that have exact match to the pattern, at each range. This I don't know how to do it in R. I guess it requires some tricks with pileup. But how?

Some help will be much appreciated.

Vang

 

 

genomicfeatures rsamtools • 802 views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

For your findPattern() function, I suggest this implementation:

hits <- GRanges(vmatchPattern(pat, mydnaseqset))

That assumes that gtfobject has names. You could provide some.

Then, you can map the transcript-local ranges to global ranges like this:

hits.global <- mapFromTranscripts(hits, gtfobject)

Then, load the reads from a BAM file, making sure to get the sequence:

ga <- readGAlignments(bam, param=ScanBamParam(what="seq"))

And map the pattern hits to the read sequence space:

hits.aln <- mapToAlignments(hits, ga)

Use that to extract the correponding sequences and check for match:

sum(mcols(ga)$seq[hits.aln] == pat)

Haven't tested any of this. Let me know where you hit a snag.

ADD COMMENT

Login before adding your answer.

Traffic: 594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6