Extract junction reads from bam
2
1
Entering edit mode
Helen Zhou ▴ 150
@helen-zhou-2654
Last seen 8.8 years ago
United States
Dear Sir/Madam, I have a bam file with reads from a paired-end RNA-seq experiments, and I would like to extract all reads that span a particular intron/exon junction. For example where part of a read maps to exon ABC starting at chr 1 position 66909348. I do not know where the first part of the read maps; there are multiple possible exons being spliced to exon ABC. I can extract all reads mapping to and spanning for example 50nt around the junction, saying: library(GenomicRanges) loc <-RangesList('1'=IRanges(start=6690298,end=6690398))? parameter?<- ScanBamParam(which=loc, what=extract, simpleCigar=FALSE, reverseComplement=FALSE) aln <-?readGAlignmentsFromBam("test.bam", param=parameter) However, this includes all the reads that span across this region because the ends of the read map outside the area indicated in 'loc'. How can I get only the reads where (part of) the read starts mapping at the exact intron/exon location at position?66909348? Thanks you H Zhou [[alternative HTML version deleted]]
• 4.2k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
If you've already loaded some alignments in your "aln" object, then you can extract the aligned regions (the exonic parts) with: exonic <- grglist(aln) Then, you want to see which read has an exonic region starting at position X (i.e., X is a 3' splice site), with the extra constraint that there must be at least one exonic region preceding (so we know there is a splice). trueForReadsSplicingToX <- any(which(start(exonic) == X) > 1) That may or may not be fast, but it is simple. Didn't test it. Michael On Mon, Aug 18, 2014 at 1:32 PM, Helen Zhou <zhou.helen at="" yahoo.com=""> wrote: > Dear Sir/Madam, > > I have a bam file with reads from a paired-end RNA-seq experiments, and I > would like to extract all reads that span a particular intron/exon > junction. For example where part of a read maps to exon ABC starting at chr > 1 position 66909348. I do not know where the first part of the read maps; > there are multiple possible exons being spliced to exon ABC. > > I can extract all reads mapping to and spanning for example 50nt around > the junction, saying: > library(GenomicRanges) > loc <-RangesList('1'=IRanges(start=6690298,end=6690398)) > parameter <- ScanBamParam(which=loc, what=extract, simpleCigar=FALSE, > reverseComplement=FALSE) > aln <- readGAlignmentsFromBam("test.bam", param=parameter) > > However, this includes all the reads that span across this region because > the ends of the read map outside the area indicated in 'loc'. How can I get > only the reads where (part of) the read starts mapping at the exact > intron/exon location at position 66909348? > > Thanks you > H Zhou > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 24 days ago
Australia/Melbourne/Olivia Newton-John …
Dear Helen, If you just want to get the number of reads mapping to that junction (not the actual reads), you can try featureCounts function in Rsubread package. It has a parameter called 'countSplitAlignmentsOnly' which allows you to count exon-spanning reads only. It is extremely fast. Best wishes, Wei On Aug 19, 2014, at 6:32 AM, Helen Zhou wrote: > Dear Sir/Madam, > > I have a bam file with reads from a paired-end RNA-seq experiments, and I would like to extract all reads that span a particular intron/exon junction. For example where part of a read maps to exon ABC starting at chr 1 position 66909348. I do not know where the first part of the read maps; there are multiple possible exons being spliced to exon ABC. > > I can extract all reads mapping to and spanning for example 50nt around the junction, saying: > library(GenomicRanges) > loc <-RangesList('1'=IRanges(start=6690298,end=6690398)) > parameter <- ScanBamParam(which=loc, what=extract, simpleCigar=FALSE, reverseComplement=FALSE) > aln <- readGAlignmentsFromBam("test.bam", param=parameter) > > However, this includes all the reads that span across this region because the ends of the read map outside the area indicated in 'loc'. How can I get only the reads where (part of) the read starts mapping at the exact intron/exon location at position 66909348? > > Thanks you > H Zhou > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT
0
Entering edit mode
Dear Wei and Michael, thank you both for your helpful comments. This was just what I was looking for. H Zhou On Monday, 18 August 2014, 15:35, Wei Shi <shi at="" wehi.edu.au=""> wrote: Dear Helen, If you just want to get the number of reads mapping to that junction (not the actual reads), you can try featureCounts function in Rsubread package. It has a parameter called 'countSplitAlignmentsOnly' which allows you to count exon-spanning reads only. It is extremely fast. Best wishes, Wei On Aug 19, 2014, at 6:32 AM, Helen Zhou wrote: > Dear Sir/Madam, > > I have a bam file with reads from a paired-end RNA-seq experiments, and I would like to extract all reads that span a particular intron/exon junction. For example where part of a read maps to exon ABC starting at chr 1 position 66909348. I do not know where the first part of the read maps; there are multiple possible exons being spliced to exon ABC. > > I can extract all reads mapping to and spanning for example 50nt around the junction, saying: > library(GenomicRanges) > loc <-RangesList('1'=IRanges(start=6690298,end=6690398)) > parameter <- ScanBamParam(which=loc, what=extract, simpleCigar=FALSE, reverseComplement=FALSE) > aln <- readGAlignmentsFromBam("test.bam", param=parameter) > > However, this includes all the reads that span across this region because the ends of the read map outside the area indicated in 'loc'. How can I get only the reads where (part of) the read starts mapping at the exact intron/exon location at position 66909348? > > Thanks you > H Zhou > ??? [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:9}}
ADD REPLY

Login before adding your answer.

Traffic: 524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6