readGapppedAlignmentpairs questions

0

Entering edit mode

Lakshmanan Iyer ▴ 250

@lakshmanan-iyer-1829

Last seen 8.6 years ago

United States

Hi My apologies for multiple posting if it happens-- I sent the last mails from other accounts which may not be registered with Bioc-list -Lax Two questions: 1. Is readGapppedAlignmentPairs - the most efficient way to read a paired-end bam file with mulit-mapped reads? I am asking as it takes an enormous amount of time to process and load. 2. How does one work with coverage on GappedAlignmentPairs in the context of RNASeq? The simplest way is to consider each left and right read as separate - essentially loose the "paired" information and calculate coverage. However, if both the left and right pair reads fall within a feature of interest - say an exon, does it imply coverage of the region of the exon between the reads too >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> LLLLLLLLLL---------------------------RRRRRRRRRR ^^^^^^^^^^^^^^^^^ In the figure above, the exon is represented by ">" and L and R represents the left and right reads aligned to the exon. I am talking about the region represented by "^". Do we assume coverage for this region too? Does Coverage on GappedAlignmentPairs do this? -best -Lax Center for Neuroscience Research Tufts Univeristy School of Medicine Boston, MA [[alternative HTML version deleted]]

Coverage PROcess Coverage PROcess • 1.1k views

ADD COMMENT • link updated 11.8 years ago by Wei Shi ★ 3.6k • written 11.9 years ago by Lakshmanan Iyer ▴ 250

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 1 day ago

Seattle, WA, United States

Hi Iyer, On 06/22/2012 06:56 AM, Lakshmanan Iyer wrote: > Hi > My apologies for multiple posting if it happens-- I sent the last mails > from other accounts which may not be registered with Bioc-list > -Lax > Two questions: > > 1. Is readGapppedAlignmentPairs - the most efficient way to read a > paired-end bam file with mulit-mapped reads? > I am asking as it takes an enormous amount of time to process and load. With a recent version of Rsamtools (>= 1.9.10), last time I tried it took between 30 and 40 min to call readGapppedAlignmentPairs() on a BAM file with 70 million records (35 million pairs). Even for a fixed nb of records and using the same machine, timing could vary a lot depending on how "hard" it is to pair the records which mostly depends on the average number of records sharing the same QNAME (the lowest, the easiest). You can get that number using the new quickBamCounts() utility from Rsamtools: > quickBamCounts("AML_330-0_gsnap_filter.bam") group | nb of | nb of | mean / max of | records | unique | records per records | in group | QNAMEs | unique QNAME All records........................ A | 70446309 | 35407913 | 1.99 / 2 o template has single segment.... S | 0 | 0 | NA / NA o template has multiple segments. M | 70446309 | 35407913 | 1.99 / 2 - first segment.............. F | 35313360 | 35313360 | 1 / 1 - last segment............... L | 35132949 | 35132949 | 1 / 1 - other segment.............. O | 0 | 0 | NA / NA Note that (S, M) is a partitioning of A, and (F, L, O) is a partitioning of M. Indentation reflects this. Here the average number of records per unique QNAME is 1.99 (and the max is 2), which is ideal. I don't remember the exact amount of memory needed to load that file with 70M records though. All I remember is that you need a lot :-/ (probably > 20GB). Make sure you have enough memory and that your system is not swapping. readGapppedAlignmentPairs() is basically calling readGapppedAlignments() followed by findMateAlignment(). Each of the 2 operations are expensive and IIRC I'm not sure there are obvious/easy optimizations that could be done on readGapppedAlignments() itself. However, for findMateAlignment(), there are a few easy ones that have been on my list for a couple of months now and that I will implement ASAP. They should improve speed and reduce memory usage. > > 2. How does one work with coverage on GappedAlignmentPairs in the context > of RNASeq? > The simplest way is to consider each left and right read as separate - > essentially loose the "paired" information and calculate coverage. > However, if both the left and right pair reads fall within a feature of > interest - say an exon, does it imply coverage of the region of the exon > between the reads too > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> > LLLLLLLLLL---------------------------RRRRRRRRRR > ^^^^^^^^^^^^^^^^^ > > In the figure above, the exon is represented by ">" and L and R represents > the left and right reads aligned to the exon. > I am talking about the region represented by "^". Do we assume coverage > for this region too? > Does Coverage on GappedAlignmentPairs do this? No, coverage on a GappedAlignmentPairs does not do this, but 'coverage(range(grglist(galp)))' will do this. Cheers, H. > > -best > -Lax > Center for Neuroscience Research > Tufts Univeristy School of Medicine > Boston, MA > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 11.9 years ago Hervé Pagès 16k

0

Entering edit mode

Hi Herve, Thanks! I have a server with 96GB of memory, so now swap issues. I just downloaded and installed the dev branch of bioconductor on my server so that I have the latest Rsamtools to work with. We are using "Star" to map the reads and it looks like it produces many multi-mapped reads and the mapping is to > 2 reads. I could see that "readGappedAlignmentPairs" was working very hard.... a lot of diagnostic messges about mates being dropped. Thanks for the tip on the coverage using grlist. -best -Lax On Fri, Jun 22, 2012 at 2:11 PM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Iyer, > > > On 06/22/2012 06:56 AM, Lakshmanan Iyer wrote: > >> Hi >> My apologies for multiple posting if it happens-- I sent the last mails >> from other accounts which may not be registered with Bioc-list >> -Lax >> Two questions: >> >> 1. Is readGapppedAlignmentPairs - the most efficient way to read a >> paired-end bam file with mulit-mapped reads? >> I am asking as it takes an enormous amount of time to process and load. >> > > With a recent version of Rsamtools (>= 1.9.10), last time I tried it > took between 30 and 40 min to call readGapppedAlignmentPairs() on a BAM > file with 70 million records (35 million pairs). Even for a fixed nb of > records and using the same machine, timing could vary a lot depending > on how "hard" it is to pair the records which mostly depends on the > average number of records sharing the same QNAME (the lowest, the > easiest). You can get that number using the new quickBamCounts() utility > from Rsamtools: > > > quickBamCounts("AML_330-0_**gsnap_filter.bam") > group | nb of | nb of | mean / max > of | records | unique | records per > records | in group | QNAMEs | unique QNAME > All records.......................**. A | 70446309 | 35407913 | 1.99 / 2 > o template has single segment.... S | 0 | 0 | NA / NA > o template has multiple segments. M | 70446309 | 35407913 | 1.99 / 2 > - first segment.............. F | 35313360 | 35313360 | 1 / 1 > - last segment............... L | 35132949 | 35132949 | 1 / 1 > - other segment.............. O | 0 | 0 | NA / NA > > Note that (S, M) is a partitioning of A, and (F, L, O) is a partitioning > of M. > Indentation reflects this. > > Here the average number of records per unique QNAME is 1.99 (and the max > is 2), which is ideal. > > I don't remember the exact amount of memory needed to load that file > with 70M records though. All I remember is that you need a lot :-/ > (probably > 20GB). Make sure you have enough memory and that your system > is not swapping. > > readGapppedAlignmentPairs() is basically calling readGapppedAlignments() > followed by findMateAlignment(). Each of the 2 operations are expensive > and IIRC I'm not sure there are obvious/easy optimizations that could > be done on readGapppedAlignments() itself. However, for > findMateAlignment(), there are a few easy ones that have been > on my list for a couple of months now and that I will implement ASAP. > They should improve speed and reduce memory usage. > > > >> 2. How does one work with coverage on GappedAlignmentPairs in the context >> of RNASeq? >> The simplest way is to consider each left and right read as separate - >> essentially loose the "paired" information and calculate coverage. >> However, if both the left and right pair reads fall within a feature of >> interest - say an exon, does it imply coverage of the region of the exon >> between the reads too >> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> LLLLLLLLLL--------------------** >> -------RRRRRRRRRR >> ^^^^^^^^^^^^^^^^^ >> >> In the figure above, the exon is represented by ">" and L and R represents >> the left and right reads aligned to the exon. >> I am talking about the region represented by "^". Do we assume coverage >> for this region too? >> Does Coverage on GappedAlignmentPairs do this? >> > > No, coverage on a GappedAlignmentPairs does not do this, but > 'coverage(range(grglist(galp))**)' will do this. > > Cheers, > H. > > >> -best >> -Lax >> Center for Neuroscience Research >> Tufts Univeristy School of Medicine >> Boston, MA >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]

ADD REPLY • link 11.9 years ago Lakshmanan Iyer ▴ 250

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 17 days ago

Australia/Melbourne/Olivia Newton-John …

Dear Lakshmanan, If the purpose of your analysis is to count reads falling within each feature, you may consider using the featureCounts() function in Rsubread package. It takes only about 2 minutes to summarize 10 million reads into a count table. But it only accept SAM files (you can use samtools to convert your BAM files to SAM files) and it only works on unix. See ?featureCounts() for more info. For your second question, if the pair of reads is indeed mapped as a pair, then the region between them will be covered as well if the two reads are on the same exon. But the reality is that not every read pair can be successfully mapped as pairs. You may get only one end mapped, or the two ends are mapped to two locations which have a distance much bigger than the average fragment lengths. In these cases, you don't even know what are the exons which lie between the two reads. Hope this helps. Cheers, Wei On Jun 22, 2012, at 11:56 PM, Lakshmanan Iyer wrote: > Hi > My apologies for multiple posting if it happens-- I sent the last mails > from other accounts which may not be registered with Bioc-list > -Lax > Two questions: > > 1. Is readGapppedAlignmentPairs - the most efficient way to read a > paired-end bam file with mulit-mapped reads? > I am asking as it takes an enormous amount of time to process and load. > > 2. How does one work with coverage on GappedAlignmentPairs in the context > of RNASeq? > The simplest way is to consider each left and right read as separate - > essentially loose the "paired" information and calculate coverage. > However, if both the left and right pair reads fall within a feature of > interest - say an exon, does it imply coverage of the region of the exon > between the reads too > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> > LLLLLLLLLL---------------------------RRRRRRRRRR > ^^^^^^^^^^^^^^^^^ > > In the figure above, the exon is represented by ">" and L and R represents > the left and right reads aligned to the exon. > I am talking about the region represented by "^". Do we assume coverage > for this region too? > Does Coverage on GappedAlignmentPairs do this? > > -best > -Lax > Center for Neuroscience Research > Tufts Univeristy School of Medicine > Boston, MA > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:8}}

ADD COMMENT • link 11.8 years ago Wei Shi ★ 3.6k

0

Entering edit mode

On 06/22/2012 07:22 AM, Wei Shi wrote: > Dear Lakshmanan, > > If the purpose of your analysis is to count reads falling within each feature, you may consider using the featureCounts() function in Rsubread package. It takes only about 2 minutes to summarize 10 million reads into a count table. But it only accept SAM files (you can use samtools to convert your BAM files to SAM files) and it only works on unix. See ?featureCounts() for more info. Hi Wei -- can you clarify how you are counting reads? From a quick scan of your man page / C source code it looks like you're counting each pair of a paired end separately, and looking for a read whose start position is in an exon / gene? This elementary counting scheme (on a bam file) is just ## what features? any GRanges or GRangesList, e.g., library(TxDb.Hsapiens.UCSC.hg19.knownGene) exByGn = exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene") ## what reads? GRanges from 'rname' and 'pos' param = ScanBamParam(what=c("rname", "qual"), flag=scanBamFlag(isUnmappedRead=FALSE)) with(scanBam(aBamFile, param=p2)[[1]], { GRanges(rname, IRanges(pos, width=1)) }) ## count, 'top level' of GRangesList, so counts per gene countOverlaps(exByGn, reads) This will be fast and memory friendly. ?countBam is another alternative, also memory efficient and taking this simple approach. ?summarizeOverlaps gives better counting schemes for single-end reads, and is also reasonably fast (and in devel space efficient, iterating over the bam file, and with some support for paired-end reads). ?readGappedAlignmentPairs, from the original post, tries to make sense of paired end reads, and is less memory / speed friendly (but the OP has a lot of memory). Martin > > For your second question, if the pair of reads is indeed mapped as a pair, then the region between them will be covered as well if the two reads are on the same exon. But the reality is that not every read pair can be successfully mapped as pairs. You may get only one end mapped, or the two ends are mapped to two locations which have a distance much bigger than the average fragment lengths. In these cases, you don't even know what are the exons which lie between the two reads. > > Hope this helps. > > Cheers, > Wei > > On Jun 22, 2012, at 11:56 PM, Lakshmanan Iyer wrote: > >> Hi >> My apologies for multiple posting if it happens-- I sent the last mails >> from other accounts which may not be registered with Bioc-list >> -Lax >> Two questions: >> >> 1. Is readGapppedAlignmentPairs - the most efficient way to read a >> paired-end bam file with mulit-mapped reads? >> I am asking as it takes an enormous amount of time to process and load. >> >> 2. How does one work with coverage on GappedAlignmentPairs in the context >> of RNASeq? >> The simplest way is to consider each left and right read as separate - >> essentially loose the "paired" information and calculate coverage. >> However, if both the left and right pair reads fall within a feature of >> interest - say an exon, does it imply coverage of the region of the exon >> between the reads too >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> LLLLLLLLLL---------------------------RRRRRRRRRR >> ^^^^^^^^^^^^^^^^^ >> >> In the figure above, the exon is represented by ">" and L and R represents >> the left and right reads aligned to the exon. >> I am talking about the region represented by "^". Do we assume coverage >> for this region too? >> Does Coverage on GappedAlignmentPairs do this? >> >> -best >> -Lax >> Center for Neuroscience Research >> Tufts Univeristy School of Medicine >> Boston, MA >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:17}}

ADD REPLY • link 11.8 years ago Martin Morgan 25k

0

Entering edit mode

On 06/23/2012 04:21 AM, Martin Morgan wrote: > On 06/22/2012 07:22 AM, Wei Shi wrote: >> Dear Lakshmanan, >> >> If the purpose of your analysis is to count reads falling within each >> feature, you may consider using the featureCounts() function in >> Rsubread package. It takes only about 2 minutes to summarize 10 >> million reads into a count table. But it only accept SAM files (you >> can use samtools to convert your BAM files to SAM files) and it only >> works on unix. See ?featureCounts() for more info. > > Hi Wei -- can you clarify how you are counting reads? From a quick scan > of your man page / C source code it looks like you're counting each pair > of a paired end separately, and looking for a read whose start position > is in an exon / gene? This elementary counting scheme (on a bam file) is > just > > ## what features? any GRanges or GRangesList, e.g., > library(TxDb.Hsapiens.UCSC.hg19.knownGene) > exByGn = exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene") > > ## what reads? GRanges from 'rname' and 'pos' > param = ScanBamParam(what=c("rname", "qual"), > flag=scanBamFlag(isUnmappedRead=FALSE)) > with(scanBam(aBamFile, param=p2)[[1]], { > GRanges(rname, IRanges(pos, width=1)) > }) Oops, should have assigned that GRanges to 'reads' reads = with(scanBam(aBamFile, param=p2)[[1]], { GRanges(rname, IRanges(pos, width=1)) }) The scheme has obvious limitations, counting each end of a paired-end read separately, counting reads that overhang gene or exon boundaries (though sometimes that might be ok), counting overhanging reads that start in a range but not those that end, etc. Martin > > ## count, 'top level' of GRangesList, so counts per gene > countOverlaps(exByGn, reads) > > This will be fast and memory friendly. ?countBam is another alternative, > also memory efficient and taking this simple approach. > > ?summarizeOverlaps gives better counting schemes for single-end reads, > and is also reasonably fast (and in devel space efficient, iterating > over the bam file, and with some support for paired-end reads). > > ?readGappedAlignmentPairs, from the original post, tries to make sense > of paired end reads, and is less memory / speed friendly (but the OP has > a lot of memory). > > Martin > >> >> For your second question, if the pair of reads is indeed mapped as a >> pair, then the region between them will be covered as well if the two >> reads are on the same exon. But the reality is that not every read >> pair can be successfully mapped as pairs. You may get only one end >> mapped, or the two ends are mapped to two locations which have a >> distance much bigger than the average fragment lengths. In these >> cases, you don't even know what are the exons which lie between the >> two reads. >> >> Hope this helps. >> >> Cheers, >> Wei >> >> On Jun 22, 2012, at 11:56 PM, Lakshmanan Iyer wrote: >> >>> Hi >>> My apologies for multiple posting if it happens-- I sent the last mails >>> from other accounts which may not be registered with Bioc-list >>> -Lax >>> Two questions: >>> >>> 1. Is readGapppedAlignmentPairs - the most efficient way to read a >>> paired-end bam file with mulit-mapped reads? >>> I am asking as it takes an enormous amount of time to process and load. >>> >>> 2. How does one work with coverage on GappedAlignmentPairs in the >>> context >>> of RNASeq? >>> The simplest way is to consider each left and right read as separate - >>> essentially loose the "paired" information and calculate coverage. >>> However, if both the left and right pair reads fall within a feature of >>> interest - say an exon, does it imply coverage of the region of the exon >>> between the reads too >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> LLLLLLLLLL---------------------------RRRRRRRRRR >>> ^^^^^^^^^^^^^^^^^ >>> >>> In the figure above, the exon is represented by ">" and L and R >>> represents >>> the left and right reads aligned to the exon. >>> I am talking about the region represented by "^". Do we assume coverage >>> for this region too? >>> Does Coverage on GappedAlignmentPairs do this? >>> >>> -best >>> -Lax >>> Center for Neuroscience Research >>> Tufts Univeristy School of Medicine >>> Boston, MA >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> ______________________________________________________________________ >> The information in this email is confidential and inte...{{dropped:17}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 11.8 years ago Martin Morgan 25k

0

Entering edit mode

Hi Martin, featureCounts does count each end of a paired-end read separately. But this is actually my favorite counting approach for paired-end reads, because this helps include those fragments into analysis which have only one end mapped, or both ends mapped at a distance greater than the average fragment length (such as those fragments which span two or more distant exons or contain chimeric sequences). featureCounts assigns a read to an exon when they have at least 1bp overlap, so it does count overhanging reads. By default, it uses RefSeq annotation. But it also accepts user provided annotation (users can simply provide a data frame to it, which includes gene id, chr, start and end). This function is not only fast, but also extremely memory efficient. It totally only uses a few megabytes of memory (for storing annotation). It does not read in the entire SAM file into memory. There is only one read in the memory at any time, so no matter how big the SAM file is, the memory usage is always a few megabytes. Hope this makes things clearer. Cheers, Wei On Jun 23, 2012, at 9:44 PM, Martin Morgan wrote: > On 06/23/2012 04:21 AM, Martin Morgan wrote: >> On 06/22/2012 07:22 AM, Wei Shi wrote: >>> Dear Lakshmanan, >>> >>> If the purpose of your analysis is to count reads falling within each >>> feature, you may consider using the featureCounts() function in >>> Rsubread package. It takes only about 2 minutes to summarize 10 >>> million reads into a count table. But it only accept SAM files (you >>> can use samtools to convert your BAM files to SAM files) and it only >>> works on unix. See ?featureCounts() for more info. >> >> Hi Wei -- can you clarify how you are counting reads? From a quick scan >> of your man page / C source code it looks like you're counting each pair >> of a paired end separately, and looking for a read whose start position >> is in an exon / gene? This elementary counting scheme (on a bam file) is >> just >> >> ## what features? any GRanges or GRangesList, e.g., >> library(TxDb.Hsapiens.UCSC.hg19.knownGene) >> exByGn = exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene") >> >> ## what reads? GRanges from 'rname' and 'pos' >> param = ScanBamParam(what=c("rname", "qual"), >> flag=scanBamFlag(isUnmappedRead=FALSE)) >> with(scanBam(aBamFile, param=p2)[[1]], { >> GRanges(rname, IRanges(pos, width=1)) >> }) > > Oops, should have assigned that GRanges to 'reads' > > reads = with(scanBam(aBamFile, param=p2)[[1]], { > GRanges(rname, IRanges(pos, width=1)) > }) > > The scheme has obvious limitations, counting each end of a paired- end read separately, counting reads that overhang gene or exon boundaries (though sometimes that might be ok), counting overhanging reads that start in a range but not those that end, etc. > > Martin >> >> ## count, 'top level' of GRangesList, so counts per gene >> countOverlaps(exByGn, reads) >> >> This will be fast and memory friendly. ?countBam is another alternative, >> also memory efficient and taking this simple approach. >> >> ?summarizeOverlaps gives better counting schemes for single-end reads, >> and is also reasonably fast (and in devel space efficient, iterating >> over the bam file, and with some support for paired-end reads). >> >> ?readGappedAlignmentPairs, from the original post, tries to make sense >> of paired end reads, and is less memory / speed friendly (but the OP has >> a lot of memory). >> >> Martin >> >>> >>> For your second question, if the pair of reads is indeed mapped as a >>> pair, then the region between them will be covered as well if the two >>> reads are on the same exon. But the reality is that not every read >>> pair can be successfully mapped as pairs. You may get only one end >>> mapped, or the two ends are mapped to two locations which have a >>> distance much bigger than the average fragment lengths. In these >>> cases, you don't even know what are the exons which lie between the >>> two reads. >>> >>> Hope this helps. >>> >>> Cheers, >>> Wei >>> >>> On Jun 22, 2012, at 11:56 PM, Lakshmanan Iyer wrote: >>> >>>> Hi >>>> My apologies for multiple posting if it happens-- I sent the last mails >>>> from other accounts which may not be registered with Bioc-list >>>> -Lax >>>> Two questions: >>>> >>>> 1. Is readGapppedAlignmentPairs - the most efficient way to read a >>>> paired-end bam file with mulit-mapped reads? >>>> I am asking as it takes an enormous amount of time to process and load. >>>> >>>> 2. How does one work with coverage on GappedAlignmentPairs in the >>>> context >>>> of RNASeq? >>>> The simplest way is to consider each left and right read as separate - >>>> essentially loose the "paired" information and calculate coverage. >>>> However, if both the left and right pair reads fall within a feature of >>>> interest - say an exon, does it imply coverage of the region of the exon >>>> between the reads too >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> LLLLLLLLLL---------------------------RRRRRRRRRR >>>> ^^^^^^^^^^^^^^^^^ >>>> >>>> In the figure above, the exon is represented by ">" and L and R >>>> represents >>>> the left and right reads aligned to the exon. >>>> I am talking about the region represented by "^". Do we assume coverage >>>> for this region too? >>>> Does Coverage on GappedAlignmentPairs do this? >>>> >>>> -best >>>> -Lax >>>> Center for Neuroscience Research >>>> Tufts Univeristy School of Medicine >>>> Boston, MA >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and inte...{{dropped:17}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 11.8 years ago Wei Shi ★ 3.6k

Login before adding your answer.