Amplicon and exon level read counts and GC content
1
0
Entering edit mode
Yu Chuan Tai ▴ 440
@yu-chuan-tai-1534
Last seen 9.6 years ago
Hi Martin, More questions on your approaches below. If my BAM files are generated by Bowtie2 on pair-end fastq files, for scanBamFlag(), should I set isPaired=TRUE? Do I need to worry about other input arguments for scanBamFlag() or ScanBamParam(), if I want to calculate coverage properly? Also, summarizeOverlaps() doesn't seem to handle paired-end reads. How to get around this, or it won't affect coverage calculation? Finally, is there any way to calculate base-specific coverage at any genomic locus or interval in Rsamtools? Thanks! Best, Yu Chuan > More specifically, after > > library(Rsamtools) > example(scanBam) # defines 'fl', a path to a bam file > > for a _single_ genomic range > > param = ScanBamParam(what="seq", > which=GRanges("seq1", IRanges(100, 500))) > dna = scanBam(fl, param=param)[[1]][["seq"]] > length(dna) # 365 reads overlap region > alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC > > though you'd likely want to specify several regions (vector arguments to > GRanges) and think about flags (scanBamFlag() and the flag argument to > ScanBamParam), read mapping quality, reads overlapping more than one region, > etc. (summarizeOverlaps implements several counting strategies, but it is > 'easy' to implement arbitrary approaches). > >> >> Martin >> >>> >>> Thanks for any input! >>> >>> Best, >>> Yu Chuan >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 >
Coverage Cancer Coverage Cancer • 1.6k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States
On 06/06/2012 09:53 PM, Yu Chuan Tai wrote: > Hi Martin, > > More questions on your approaches below. If my BAM files are > generated by Bowtie2 on pair-end fastq files, for scanBamFlag(), > should I set isPaired=TRUE? Do I need to worry about other input > arguments for scanBamFlag() or ScanBamParam(), if I want to > calculate coverage properly? It really depends on what you're interested in doing; see for instance the post by Herve the other day https://stat.ethz.ch/pipermail/bioconductor/2012-June/046052.html > > Also, summarizeOverlaps() doesn't seem to handle paired-end reads. > How to get around this, or it won't affect coverage calculation? There is better support for paired-end reads in the 'devel' version of Biocondcutor; see http://bioconductor.org/developers/useDevel/ whether and what aspects of paired-endedness are important depends on how you are using your coverage. > > Finally, is there any way to calculate base-specific coverage at any > genomic locus or interval in Rsamtools? Thanks! I tried to answer this in your other post. Martin > > Best, Yu Chuan > >> More specifically, after >> >> library(Rsamtools) example(scanBam) # defines 'fl', a path to a >> bam file >> >> for a _single_ genomic range >> >> param = ScanBamParam(what="seq", which=GRanges("seq1", >> IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]] >> length(dna) # 365 reads overlap region alphabetFrequency(dna, >> collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >> >> though you'd likely want to specify several regions (vector >> arguments to GRanges) and think about flags (scanBamFlag() and the >> flag argument to ScanBamParam), read mapping quality, reads >> overlapping more than one region, etc. (summarizeOverlaps >> implements several counting strategies, but it is 'easy' to >> implement arbitrary approaches). >> >>> >>> Martin >>> >>>> >>>> Thanks for any input! >>>> >>>> Best, Yu Chuan >>>> >>>> _______________________________________________ Bioconductor >>>> mailing list Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>>> archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> >> >>>> >>>> -- >> Computational Biology Fred Hutchinson Cancer Research Center 1100 >> Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 Telephone: 206 667-2793 >> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Hi Martin, Thanks! I will look into the links below. By 'better support for paired-end reads in the 'devel' version', which package are you referring to? Best, Yu Chuan On Thu, 7 Jun 2012, Martin Morgan wrote: > On 06/06/2012 09:53 PM, Yu Chuan Tai wrote: >> Hi Martin, >> >> More questions on your approaches below. If my BAM files are >> generated by Bowtie2 on pair-end fastq files, for scanBamFlag(), >> should I set isPaired=TRUE? Do I need to worry about other input >> arguments for scanBamFlag() or ScanBamParam(), if I want to >> calculate coverage properly? > > It really depends on what you're interested in doing; see for instance the > post by Herve the other day > > https://stat.ethz.ch/pipermail/bioconductor/2012-June/046052.html > >> >> Also, summarizeOverlaps() doesn't seem to handle paired-end reads. >> How to get around this, or it won't affect coverage calculation? > > There is better support for paired-end reads in the 'devel' version of > Biocondcutor; see > > http://bioconductor.org/developers/useDevel/ > > whether and what aspects of paired-endedness are important depends on how you > are using your coverage. > >> >> Finally, is there any way to calculate base-specific coverage at any >> genomic locus or interval in Rsamtools? Thanks! > > I tried to answer this in your other post. > > Martin > >> >> Best, Yu Chuan >> >>> More specifically, after >>> >>> library(Rsamtools) example(scanBam) # defines 'fl', a path to a >>> bam file >>> >>> for a _single_ genomic range >>> >>> param = ScanBamParam(what="seq", which=GRanges("seq1", >>> IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]] >>> length(dna) # 365 reads overlap region alphabetFrequency(dna, >>> collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >>> >>> though you'd likely want to specify several regions (vector >>> arguments to GRanges) and think about flags (scanBamFlag() and the >>> flag argument to ScanBamParam), read mapping quality, reads >>> overlapping more than one region, etc. (summarizeOverlaps >>> implements several counting strategies, but it is 'easy' to >>> implement arbitrary approaches). >>> >>>> >>>> Martin >>>> >>>>> >>>>> Thanks for any input! >>>>> >>>>> Best, Yu Chuan >>>>> >>>>> _______________________________________________ Bioconductor >>>>> mailing list Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>>>> archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >>> >>>>> >>>>> > -- >>> Computational Biology Fred Hutchinson Cancer Research Center 1100 >>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>> >>> Location: M1-B861 Telephone: 206 667-2793 >>> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 >
ADD REPLY
0
Entering edit mode
On 06/07/2012 07:54 AM, Yu Chuan Tai wrote: > Hi Martin, > > Thanks! I will look into the links below. By 'better support for > paired-end reads in the 'devel' version', which package are you > referring to? Mostly GenomicRanges, e.g., readGappedAlignmentPairs, building on additional facilities in Rsamtools. Herve is responsible for this. Martin > > Best, > Yu Chuan > > On Thu, 7 Jun 2012, Martin Morgan wrote: > >> On 06/06/2012 09:53 PM, Yu Chuan Tai wrote: >>> Hi Martin, >>> >>> More questions on your approaches below. If my BAM files are >>> generated by Bowtie2 on pair-end fastq files, for scanBamFlag(), >>> should I set isPaired=TRUE? Do I need to worry about other input >>> arguments for scanBamFlag() or ScanBamParam(), if I want to >>> calculate coverage properly? >> >> It really depends on what you're interested in doing; see for instance >> the post by Herve the other day >> >> https://stat.ethz.ch/pipermail/bioconductor/2012-June/046052.html >> >>> >>> Also, summarizeOverlaps() doesn't seem to handle paired-end reads. >>> How to get around this, or it won't affect coverage calculation? >> >> There is better support for paired-end reads in the 'devel' version of >> Biocondcutor; see >> >> http://bioconductor.org/developers/useDevel/ >> >> whether and what aspects of paired-endedness are important depends on >> how you are using your coverage. >> >>> >>> Finally, is there any way to calculate base-specific coverage at any >>> genomic locus or interval in Rsamtools? Thanks! >> >> I tried to answer this in your other post. >> >> Martin >> >>> >>> Best, Yu Chuan >>> >>>> More specifically, after >>>> >>>> library(Rsamtools) example(scanBam) # defines 'fl', a path to a >>>> bam file >>>> >>>> for a _single_ genomic range >>>> >>>> param = ScanBamParam(what="seq", which=GRanges("seq1", >>>> IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]] >>>> length(dna) # 365 reads overlap region alphabetFrequency(dna, >>>> collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >>>> >>>> though you'd likely want to specify several regions (vector >>>> arguments to GRanges) and think about flags (scanBamFlag() and the >>>> flag argument to ScanBamParam), read mapping quality, reads >>>> overlapping more than one region, etc. (summarizeOverlaps >>>> implements several counting strategies, but it is 'easy' to >>>> implement arbitrary approaches). >>>> >>>>> >>>>> Martin >>>>> >>>>>> >>>>>> Thanks for any input! >>>>>> >>>>>> Best, Yu Chuan >>>>>> >>>>>> _______________________________________________ Bioconductor >>>>>> mailing list Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>>>>> archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>> >>>> >>>> >>>>>> >>>>>> >> -- >>>> Computational Biology Fred Hutchinson Cancer Research Center 1100 >>>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: M1-B861 Telephone: 206 667-2793 >>>> >> >> >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Hi Martin, Is it correct that the below code calculate the number of reads overlapping with an amplicon, and overlapping means at least 1 base overlap, and it doesn't have to be fully within the amplicon? In the case that a read overlaps with 2 amplicons, will it be counted twice? When I used this approach to calculate amplicon-level read counts, I found the number of read counts overlapping with all the amplicons is larger than the total number of read counts in the BAM file, and wonder if that's b/c a read could be counted more than once? I found that the code below gives more read counts than using summarizeOverlaps(). I think it's b/c the latter counts a read at most once. If I want to calculate the coverage of SNVs/INDELs outputted from samtools, is it correct that using summarizeOverlaps() will under-estimate the coverage, since a read may overlap with several SNVs/INDELs? Thanks! Yu Chuan > param = ScanBamParam(what="seq", which=GRanges("seq1",IRanges(100, 500))) > dna = scanBam(fl, param=param)[[1]][["seq"]] > length(dna) # 365 reads overlap region > alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC On Thu, 7 Jun 2012, Martin Morgan wrote: > On 06/07/2012 07:54 AM, Yu Chuan Tai wrote: >> Hi Martin, >> >> Thanks! I will look into the links below. By 'better support for >> paired-end reads in the 'devel' version', which package are you >> referring to? > > Mostly GenomicRanges, e.g., readGappedAlignmentPairs, building on additional > facilities in Rsamtools. Herve is responsible for this. > > Martin > >> >> Best, >> Yu Chuan >> >> On Thu, 7 Jun 2012, Martin Morgan wrote: >> >>> On 06/06/2012 09:53 PM, Yu Chuan Tai wrote: >>>> Hi Martin, >>>> >>>> More questions on your approaches below. If my BAM files are >>>> generated by Bowtie2 on pair-end fastq files, for scanBamFlag(), >>>> should I set isPaired=TRUE? Do I need to worry about other input >>>> arguments for scanBamFlag() or ScanBamParam(), if I want to >>>> calculate coverage properly? >>> >>> It really depends on what you're interested in doing; see for instance >>> the post by Herve the other day >>> >>> https://stat.ethz.ch/pipermail/bioconductor/2012-June/046052.html >>> >>>> >>>> Also, summarizeOverlaps() doesn't seem to handle paired-end reads. >>>> How to get around this, or it won't affect coverage calculation? >>> >>> There is better support for paired-end reads in the 'devel' version of >>> Biocondcutor; see >>> >>> http://bioconductor.org/developers/useDevel/ >>> >>> whether and what aspects of paired-endedness are important depends on >>> how you are using your coverage. >>> >>>> >>>> Finally, is there any way to calculate base-specific coverage at any >>>> genomic locus or interval in Rsamtools? Thanks! >>> >>> I tried to answer this in your other post. >>> >>> Martin >>> >>>> >>>> Best, Yu Chuan >>>> >>>>> More specifically, after >>>>> >>>>> library(Rsamtools) example(scanBam) # defines 'fl', a path to a >>>>> bam file >>>>> >>>>> for a _single_ genomic range >>>>> >>>>> param = ScanBamParam(what="seq", which=GRanges("seq1", >>>>> IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]] >>>>> length(dna) # 365 reads overlap region alphabetFrequency(dna, >>>>> collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >>>>> >>>>> though you'd likely want to specify several regions (vector >>>>> arguments to GRanges) and think about flags (scanBamFlag() and the >>>>> flag argument to ScanBamParam), read mapping quality, reads >>>>> overlapping more than one region, etc. (summarizeOverlaps >>>>> implements several counting strategies, but it is 'easy' to >>>>> implement arbitrary approaches). >>>>> >>>>>> >>>>>> Martin >>>>>> >>>>>>> >>>>>>> Thanks for any input! >>>>>>> >>>>>>> Best, Yu Chuan >>>>>>> >>>>>>> _______________________________________________ Bioconductor >>>>>>> mailing list Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>>>>>> archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>>>> >>>>>>> >>> -- >>>>> Computational Biology Fred Hutchinson Cancer Research Center 1100 >>>>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>>>> >>>>> Location: M1-B861 Telephone: 206 667-2793 >>>>> >>> >>> >>> -- >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >>> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 >
ADD REPLY
0
Entering edit mode
On 06/07/2012 11:34 PM, Yu Chuan Tai wrote: > Hi Martin, > > Is it correct that the below code calculate the number of reads overlapping > with an amplicon, and overlapping means at least 1 base overlap, and it > doesn't have to be fully within the amplicon? In the case that a read > overlaps > with 2 amplicons, will it be counted twice? When I used this approach to > calculate > amplicon-level read counts, I found the number of read counts > overlapping with > all the amplicons is larger than the total number of read counts in the > BAM file, > and wonder if that's b/c a read could be counted more than once? yes > I found that the code below gives more read counts than using > summarizeOverlaps(). > I think it's b/c the latter counts a read at most once. see ?summarizeOverlaps for how counting occurs with different modes. > > If I want to calculate the coverage of SNVs/INDELs outputted from samtools, > is it correct that using summarizeOverlaps() will under-estimate the > coverage, since > a read may overlap with several SNVs/INDELs? It is 'easy' using findOverlaps() or countOverlaps() to create counting schemes that are different from those implemented in samtools / scanBam, summarizeOverlaps, etc. Martin > > Thanks! > Yu Chuan > > > param = ScanBamParam(what="seq", which=GRanges("seq1",IRanges(100, > 500))) > > dna = scanBam(fl, param=param)[[1]][["seq"]] > > length(dna) # 365 reads overlap region > > alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC > > > > On Thu, 7 Jun 2012, Martin Morgan wrote: > >> On 06/07/2012 07:54 AM, Yu Chuan Tai wrote: >>> Hi Martin, >>> >>> Thanks! I will look into the links below. By 'better support for >>> paired-end reads in the 'devel' version', which package are you >>> referring to? >> >> Mostly GenomicRanges, e.g., readGappedAlignmentPairs, building on >> additional facilities in Rsamtools. Herve is responsible for this. >> >> Martin >> >>> >>> Best, >>> Yu Chuan >>> >>> On Thu, 7 Jun 2012, Martin Morgan wrote: >>> >>>> On 06/06/2012 09:53 PM, Yu Chuan Tai wrote: >>>>> Hi Martin, >>>>> >>>>> More questions on your approaches below. If my BAM files are >>>>> generated by Bowtie2 on pair-end fastq files, for scanBamFlag(), >>>>> should I set isPaired=TRUE? Do I need to worry about other input >>>>> arguments for scanBamFlag() or ScanBamParam(), if I want to >>>>> calculate coverage properly? >>>> >>>> It really depends on what you're interested in doing; see for instance >>>> the post by Herve the other day >>>> >>>> https://stat.ethz.ch/pipermail/bioconductor/2012-June/046052.html >>>> >>>>> >>>>> Also, summarizeOverlaps() doesn't seem to handle paired-end reads. >>>>> How to get around this, or it won't affect coverage calculation? >>>> >>>> There is better support for paired-end reads in the 'devel' version of >>>> Biocondcutor; see >>>> >>>> http://bioconductor.org/developers/useDevel/ >>>> >>>> whether and what aspects of paired-endedness are important depends on >>>> how you are using your coverage. >>>> >>>>> >>>>> Finally, is there any way to calculate base-specific coverage at any >>>>> genomic locus or interval in Rsamtools? Thanks! >>>> >>>> I tried to answer this in your other post. >>>> >>>> Martin >>>> >>>>> >>>>> Best, Yu Chuan >>>>> >>>>>> More specifically, after >>>>>> >>>>>> library(Rsamtools) example(scanBam) # defines 'fl', a path to a >>>>>> bam file >>>>>> >>>>>> for a _single_ genomic range >>>>>> >>>>>> param = ScanBamParam(what="seq", which=GRanges("seq1", >>>>>> IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]] >>>>>> length(dna) # 365 reads overlap region alphabetFrequency(dna, >>>>>> collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >>>>>> >>>>>> though you'd likely want to specify several regions (vector >>>>>> arguments to GRanges) and think about flags (scanBamFlag() and the >>>>>> flag argument to ScanBamParam), read mapping quality, reads >>>>>> overlapping more than one region, etc. (summarizeOverlaps >>>>>> implements several counting strategies, but it is 'easy' to >>>>>> implement arbitrary approaches). >>>>>> >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>>> >>>>>>>> Thanks for any input! >>>>>>>> >>>>>>>> Best, Yu Chuan >>>>>>>> >>>>>>>> _______________________________________________ Bioconductor >>>>>>>> mailing list Bioconductor at r-project.org >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>>>>>>> archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> >>>> -- >>>>>> Computational Biology Fred Hutchinson Cancer Research Center 1100 >>>>>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>>>>> >>>>>> Location: M1-B861 Telephone: 206 667-2793 >>>>>> >>>> >>>> >>>> -- >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 >>>> >> >> >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Hi Martin, A quick question. Does it matter if I don't have the strand info. for the interval that I am interested in, when I specify the input arguments for ScanBamParam()? Best, Yu Chuan On Thu, 7 Jun 2012, Martin Morgan wrote: > On 06/06/2012 09:53 PM, Yu Chuan Tai wrote: >> Hi Martin, >> >> More questions on your approaches below. If my BAM files are >> generated by Bowtie2 on pair-end fastq files, for scanBamFlag(), >> should I set isPaired=TRUE? Do I need to worry about other input >> arguments for scanBamFlag() or ScanBamParam(), if I want to >> calculate coverage properly? > > It really depends on what you're interested in doing; see for instance the > post by Herve the other day > > https://stat.ethz.ch/pipermail/bioconductor/2012-June/046052.html > >> >> Also, summarizeOverlaps() doesn't seem to handle paired-end reads. >> How to get around this, or it won't affect coverage calculation? > > There is better support for paired-end reads in the 'devel' version of > Biocondcutor; see > > http://bioconductor.org/developers/useDevel/ > > whether and what aspects of paired-endedness are important depends on how you > are using your coverage. > >> >> Finally, is there any way to calculate base-specific coverage at any >> genomic locus or interval in Rsamtools? Thanks! > > I tried to answer this in your other post. > > Martin > >> >> Best, Yu Chuan >> >>> More specifically, after >>> >>> library(Rsamtools) example(scanBam) # defines 'fl', a path to a >>> bam file >>> >>> for a _single_ genomic range >>> >>> param = ScanBamParam(what="seq", which=GRanges("seq1", >>> IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]] >>> length(dna) # 365 reads overlap region alphabetFrequency(dna, >>> collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >>> >>> though you'd likely want to specify several regions (vector >>> arguments to GRanges) and think about flags (scanBamFlag() and the >>> flag argument to ScanBamParam), read mapping quality, reads >>> overlapping more than one region, etc. (summarizeOverlaps >>> implements several counting strategies, but it is 'easy' to >>> implement arbitrary approaches). >>> >>>> >>>> Martin >>>> >>>>> >>>>> Thanks for any input! >>>>> >>>>> Best, Yu Chuan >>>>> >>>>> _______________________________________________ Bioconductor >>>>> mailing list Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>>>> archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >>> >>>>> >>>>> > -- >>> Computational Biology Fred Hutchinson Cancer Research Center 1100 >>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>> >>> Location: M1-B861 Telephone: 206 667-2793 >>> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 >
ADD REPLY
0
Entering edit mode
On 06/08/2012 07:18 PM, Yu Chuan Tai wrote: > Hi Martin, > > A quick question. Does it matter if I don't have the strand info. for > the interval that I am interested in, when I specify the input arguments > for ScanBamParam()? See ?ScanBamParam which says which: A 'GRanges', 'RangesList', 'RangedData', or missing object, from which a 'IRangesList' instance will be constructed. Names of the 'IRangesList' correspond to reference sequences, and ranges to the regions on that reference sequence for which matches are desired. Because data types are coerced to 'IRangesList', 'which' does _not_ include strand information (use the 'flag' argument instead). Only records with a read overlapping the specified ranges are returned. All ranges must have ends less than or equal to 536870912. If you provide GRanges, with strand information, the strand information is ignored. If you want reads only on the plus strand, see (on the same help page) isMinusStrand: A logical(1) indicating whether reads aligned to the plus (FALSE), minus (TRUE), or any (NA) strand should be returned. so ScanBamParam(flag=scanBamFlag(isMinusStrand=FALSE)) Martin > > Best, > Yu Chuan > > On Thu, 7 Jun 2012, Martin Morgan wrote: > >> On 06/06/2012 09:53 PM, Yu Chuan Tai wrote: >>> Hi Martin, >>> >>> More questions on your approaches below. If my BAM files are >>> generated by Bowtie2 on pair-end fastq files, for scanBamFlag(), >>> should I set isPaired=TRUE? Do I need to worry about other input >>> arguments for scanBamFlag() or ScanBamParam(), if I want to >>> calculate coverage properly? >> >> It really depends on what you're interested in doing; see for instance >> the post by Herve the other day >> >> https://stat.ethz.ch/pipermail/bioconductor/2012-June/046052.html >> >>> >>> Also, summarizeOverlaps() doesn't seem to handle paired-end reads. >>> How to get around this, or it won't affect coverage calculation? >> >> There is better support for paired-end reads in the 'devel' version of >> Biocondcutor; see >> >> http://bioconductor.org/developers/useDevel/ >> >> whether and what aspects of paired-endedness are important depends on >> how you are using your coverage. >> >>> >>> Finally, is there any way to calculate base-specific coverage at any >>> genomic locus or interval in Rsamtools? Thanks! >> >> I tried to answer this in your other post. >> >> Martin >> >>> >>> Best, Yu Chuan >>> >>>> More specifically, after >>>> >>>> library(Rsamtools) example(scanBam) # defines 'fl', a path to a >>>> bam file >>>> >>>> for a _single_ genomic range >>>> >>>> param = ScanBamParam(what="seq", which=GRanges("seq1", >>>> IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]] >>>> length(dna) # 365 reads overlap region alphabetFrequency(dna, >>>> collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >>>> >>>> though you'd likely want to specify several regions (vector >>>> arguments to GRanges) and think about flags (scanBamFlag() and the >>>> flag argument to ScanBamParam), read mapping quality, reads >>>> overlapping more than one region, etc. (summarizeOverlaps >>>> implements several counting strategies, but it is 'easy' to >>>> implement arbitrary approaches). >>>> >>>>> >>>>> Martin >>>>> >>>>>> >>>>>> Thanks for any input! >>>>>> >>>>>> Best, Yu Chuan >>>>>> >>>>>> _______________________________________________ Bioconductor >>>>>> mailing list Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>>>>> archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>> >>>> >>>> >>>>>> >>>>>> >> -- >>>> Computational Biology Fred Hutchinson Cancer Research Center 1100 >>>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: M1-B861 Telephone: 206 667-2793 >>>> >> >> >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6