Amplicon and exon level read counts and GC content
4
0
Entering edit mode
Yu Chuan Tai ▴ 440
@yu-chuan-tai-1534
Last seen 10.2 years ago
Hi, I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon sequencing, is there any package/function which can extract amplicon- level read counts and GC content from an aligned file of BAM format? The same question to exon-level read counts and GC content. More generally, given a genomic interval, how could I extract the read count and GC content for that interval? Thanks for any input! Best, Yu Chuan
• 3.2k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
On 05/29/2012 10:17 PM, Yu Chuan Tai wrote: > Hi, > > I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon > sequencing, is there any package/function which can extract > amplicon-level read counts and GC content from an aligned file of BAM > format? The same question to exon-level read counts and GC content. More > generally, given a genomic interval, how could I extract the read count > and GC content for that interval? The Rsamtools package has scanBam for read input, also GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps for higher-level read counting. The requirement is generally a 'GRanges', which can be queried from TxDb packages (e.g., TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many other sources. There are vignettes in GenomicRanges, GenomicFeatures, rtracklayer, and Rsamtools packages; see http://bioconductor.org/packages/release/BiocViews.html#___Software Martin > > Thanks for any input! > > Best, > Yu Chuan > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
On 05/29/2012 10:44 PM, Martin Morgan wrote: > On 05/29/2012 10:17 PM, Yu Chuan Tai wrote: >> Hi, >> >> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon >> sequencing, is there any package/function which can extract >> amplicon-level read counts and GC content from an aligned file of BAM >> format? The same question to exon-level read counts and GC content. More >> generally, given a genomic interval, how could I extract the read count >> and GC content for that interval? > > The Rsamtools package has scanBam for read input, also > GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps > for higher-level read counting. The requirement is generally a > 'GRanges', which can be queried from TxDb packages (e.g., > TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many > other sources. There are vignettes in GenomicRanges, GenomicFeatures, > rtracklayer, and Rsamtools packages; see > > http://bioconductor.org/packages/release/BiocViews.html#___Software More specifically, after library(Rsamtools) example(scanBam) # defines 'fl', a path to a bam file for a _single_ genomic range param = ScanBamParam(what="seq", which=GRanges("seq1", IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]] length(dna) # 365 reads overlap region alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC though you'd likely want to specify several regions (vector arguments to GRanges) and think about flags (scanBamFlag() and the flag argument to ScanBamParam), read mapping quality, reads overlapping more than one region, etc. (summarizeOverlaps implements several counting strategies, but it is 'easy' to implement arbitrary approaches). > > Martin > >> >> Thanks for any input! >> >> Best, >> Yu Chuan >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY
0
Entering edit mode
Hi Martin, Thanks for your quick reply and great help! I will try what you suggested below and let you know if I still have any question. A quick question. Can RSamTools convert BAM to SAM? Best, Yu Chuan On Tue, 29 May 2012, Martin Morgan wrote: > On 05/29/2012 10:44 PM, Martin Morgan wrote: >> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote: >>> Hi, >>> >>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon >>> sequencing, is there any package/function which can extract >>> amplicon-level read counts and GC content from an aligned file of BAM >>> format? The same question to exon-level read counts and GC content. More >>> generally, given a genomic interval, how could I extract the read count >>> and GC content for that interval? >> >> The Rsamtools package has scanBam for read input, also >> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps >> for higher-level read counting. The requirement is generally a >> 'GRanges', which can be queried from TxDb packages (e.g., >> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many >> other sources. There are vignettes in GenomicRanges, GenomicFeatures, >> rtracklayer, and Rsamtools packages; see >> >> http://bioconductor.org/packages/release/BiocViews.html#___Software > > More specifically, after > > library(Rsamtools) > example(scanBam) # defines 'fl', a path to a bam file > > for a _single_ genomic range > > param = ScanBamParam(what="seq", > which=GRanges("seq1", IRanges(100, 500))) > dna = scanBam(fl, param=param)[[1]][["seq"]] > length(dna) # 365 reads overlap region > alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC > > though you'd likely want to specify several regions (vector arguments to > GRanges) and think about flags (scanBamFlag() and the flag argument to > ScanBamParam), read mapping quality, reads overlapping more than one region, > etc. (summarizeOverlaps implements several counting strategies, but it is > 'easy' to implement arbitrary approaches). > >> >> Martin >> >>> >>> Thanks for any input! >>> >>> Best, >>> Yu Chuan >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 >
ADD REPLY
0
Entering edit mode
On 05/29/2012 11:12 PM, Yu Chuan Tai wrote: > Hi Martin, > > Thanks for your quick reply and great help! I will try what you > suggested below and let you know if I still have any question. A quick > question. Can RSamTools convert BAM to SAM? No; it will go the other way around (SAM --> BAM, see ?asBam) which is generally the right thing to do (smaller files, more easily 'queried' for regions of interest, etc, especially after indexing ?indexBam). Martin > > Best, > Yu Chuan > > On Tue, 29 May 2012, Martin Morgan wrote: > >> On 05/29/2012 10:44 PM, Martin Morgan wrote: >>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote: >>>> Hi, >>>> >>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon >>>> sequencing, is there any package/function which can extract >>>> amplicon-level read counts and GC content from an aligned file of BAM >>>> format? The same question to exon-level read counts and GC content. >>>> More >>>> generally, given a genomic interval, how could I extract the read count >>>> and GC content for that interval? >>> >>> The Rsamtools package has scanBam for read input, also >>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps >>> for higher-level read counting. The requirement is generally a >>> 'GRanges', which can be queried from TxDb packages (e.g., >>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many >>> other sources. There are vignettes in GenomicRanges, GenomicFeatures, >>> rtracklayer, and Rsamtools packages; see >>> >>> http://bioconductor.org/packages/release/BiocViews.html#___Software >> >> More specifically, after >> >> library(Rsamtools) >> example(scanBam) # defines 'fl', a path to a bam file >> >> for a _single_ genomic range >> >> param = ScanBamParam(what="seq", >> which=GRanges("seq1", IRanges(100, 500))) >> dna = scanBam(fl, param=param)[[1]][["seq"]] >> length(dna) # 365 reads overlap region >> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >> >> though you'd likely want to specify several regions (vector arguments >> to GRanges) and think about flags (scanBamFlag() and the flag argument >> to ScanBamParam), read mapping quality, reads overlapping more than >> one region, etc. (summarizeOverlaps implements several counting >> strategies, but it is 'easy' to implement arbitrary approaches). >> >>> >>> Martin >>> >>>> >>>> Thanks for any input! >>>> >>>> Best, >>>> Yu Chuan >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 >> -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY
0
Entering edit mode
I see. Thanks again for your clarifications! Best, Yu Chuan On Tue, 29 May 2012, Martin Morgan wrote: > On 05/29/2012 11:12 PM, Yu Chuan Tai wrote: >> Hi Martin, >> >> Thanks for your quick reply and great help! I will try what you >> suggested below and let you know if I still have any question. A quick >> question. Can RSamTools convert BAM to SAM? > > No; it will go the other way around (SAM --> BAM, see ?asBam) which is > generally the right thing to do (smaller files, more easily 'queried' for > regions of interest, etc, especially after indexing ?indexBam). Martin > >> >> Best, >> Yu Chuan >> >> On Tue, 29 May 2012, Martin Morgan wrote: >> >>> On 05/29/2012 10:44 PM, Martin Morgan wrote: >>>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote: >>>>> Hi, >>>>> >>>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon >>>>> sequencing, is there any package/function which can extract >>>>> amplicon-level read counts and GC content from an aligned file of BAM >>>>> format? The same question to exon-level read counts and GC content. >>>>> More >>>>> generally, given a genomic interval, how could I extract the read count >>>>> and GC content for that interval? >>>> >>>> The Rsamtools package has scanBam for read input, also >>>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps >>>> for higher-level read counting. The requirement is generally a >>>> 'GRanges', which can be queried from TxDb packages (e.g., >>>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many >>>> other sources. There are vignettes in GenomicRanges, GenomicFeatures, >>>> rtracklayer, and Rsamtools packages; see >>>> >>>> http://bioconductor.org/packages/release/BiocViews.html#___Software >>> >>> More specifically, after >>> >>> library(Rsamtools) >>> example(scanBam) # defines 'fl', a path to a bam file >>> >>> for a _single_ genomic range >>> >>> param = ScanBamParam(what="seq", >>> which=GRanges("seq1", IRanges(100, 500))) >>> dna = scanBam(fl, param=param)[[1]][["seq"]] >>> length(dna) # 365 reads overlap region >>> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >>> >>> though you'd likely want to specify several regions (vector arguments >>> to GRanges) and think about flags (scanBamFlag() and the flag argument >>> to ScanBamParam), read mapping quality, reads overlapping more than >>> one region, etc. (summarizeOverlaps implements several counting >>> strategies, but it is 'easy' to implement arbitrary approaches). >>> >>>> >>>> Martin >>>> >>>>> >>>>> Thanks for any input! >>>>> >>>>> Best, >>>>> Yu Chuan >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >>> -- >>> Computational Biology >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>> >>> Location: M1-B861 >>> Telephone: 206 667-2793 >>> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 >
ADD REPLY
0
Entering edit mode
You may use the following command to convert BAM to SAM if you got Samtools installed on a unix computer: samtools view aligned.BAM > aligned.SAM It will be nice if Rsamtools can support this conversion as Samtools supports this conversion. The reason why featureCounts in Rsubread does not support BAM files is because we run this function straight after running the align() function for the alignment. But I think we may need to add support for this ... Wei On May 30, 2012, at 4:18 PM, Martin Morgan wrote: > On 05/29/2012 11:12 PM, Yu Chuan Tai wrote: >> Hi Martin, >> >> Thanks for your quick reply and great help! I will try what you >> suggested below and let you know if I still have any question. A quick >> question. Can RSamTools convert BAM to SAM? > > No; it will go the other way around (SAM --> BAM, see ?asBam) which is generally the right thing to do (smaller files, more easily 'queried' for regions of interest, etc, especially after indexing ?indexBam). Martin > >> >> Best, >> Yu Chuan >> >> On Tue, 29 May 2012, Martin Morgan wrote: >> >>> On 05/29/2012 10:44 PM, Martin Morgan wrote: >>>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote: >>>>> Hi, >>>>> >>>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon >>>>> sequencing, is there any package/function which can extract >>>>> amplicon-level read counts and GC content from an aligned file of BAM >>>>> format? The same question to exon-level read counts and GC content. >>>>> More >>>>> generally, given a genomic interval, how could I extract the read count >>>>> and GC content for that interval? >>>> >>>> The Rsamtools package has scanBam for read input, also >>>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps >>>> for higher-level read counting. The requirement is generally a >>>> 'GRanges', which can be queried from TxDb packages (e.g., >>>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many >>>> other sources. There are vignettes in GenomicRanges, GenomicFeatures, >>>> rtracklayer, and Rsamtools packages; see >>>> >>>> http://bioconductor.org/packages/release/BiocViews.html#___Software >>> >>> More specifically, after >>> >>> library(Rsamtools) >>> example(scanBam) # defines 'fl', a path to a bam file >>> >>> for a _single_ genomic range >>> >>> param = ScanBamParam(what="seq", >>> which=GRanges("seq1", IRanges(100, 500))) >>> dna = scanBam(fl, param=param)[[1]][["seq"]] >>> length(dna) # 365 reads overlap region >>> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >>> >>> though you'd likely want to specify several regions (vector arguments >>> to GRanges) and think about flags (scanBamFlag() and the flag argument >>> to ScanBamParam), read mapping quality, reads overlapping more than >>> one region, etc. (summarizeOverlaps implements several counting >>> strategies, but it is 'easy' to implement arbitrary approaches). >>> >>>> >>>> Martin >>>> >>>>> >>>>> Thanks for any input! >>>>> >>>>> Best, >>>>> Yu Chuan >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >>> -- >>> Computational Biology >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>> >>> Location: M1-B861 >>> Telephone: 206 667-2793 >>> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD REPLY
0
Entering edit mode
Hi Wei, Thanks again for your clarifications! I think it would be great if your package can support BAM files in the future, since the file size is smaller. Best, Yu Chuan On Wed, 30 May 2012, Wei Shi wrote: > You may use the following command to convert BAM to SAM if you got Samtools installed on a unix computer: > > samtools view aligned.BAM > aligned.SAM > > It will be nice if Rsamtools can support this conversion as Samtools supports this conversion. The reason why featureCounts in Rsubread does not support BAM files is because we run this function straight after running the align() function for the alignment. But I think we may need to add support for this ... > > Wei > > On May 30, 2012, at 4:18 PM, Martin Morgan wrote: > >> On 05/29/2012 11:12 PM, Yu Chuan Tai wrote: >>> Hi Martin, >>> >>> Thanks for your quick reply and great help! I will try what you >>> suggested below and let you know if I still have any question. A quick >>> question. Can RSamTools convert BAM to SAM? >> >> No; it will go the other way around (SAM --> BAM, see ?asBam) which is generally the right thing to do (smaller files, more easily 'queried' for regions of interest, etc, especially after indexing ?indexBam). Martin >> >>> >>> Best, >>> Yu Chuan >>> >>> On Tue, 29 May 2012, Martin Morgan wrote: >>> >>>> On 05/29/2012 10:44 PM, Martin Morgan wrote: >>>>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote: >>>>>> Hi, >>>>>> >>>>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon >>>>>> sequencing, is there any package/function which can extract >>>>>> amplicon-level read counts and GC content from an aligned file of BAM >>>>>> format? The same question to exon-level read counts and GC content. >>>>>> More >>>>>> generally, given a genomic interval, how could I extract the read count >>>>>> and GC content for that interval? >>>>> >>>>> The Rsamtools package has scanBam for read input, also >>>>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps >>>>> for higher-level read counting. The requirement is generally a >>>>> 'GRanges', which can be queried from TxDb packages (e.g., >>>>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many >>>>> other sources. There are vignettes in GenomicRanges, GenomicFeatures, >>>>> rtracklayer, and Rsamtools packages; see >>>>> >>>>> http://bioconductor.org/packages/release/BiocViews.html#___Software >>>> >>>> More specifically, after >>>> >>>> library(Rsamtools) >>>> example(scanBam) # defines 'fl', a path to a bam file >>>> >>>> for a _single_ genomic range >>>> >>>> param = ScanBamParam(what="seq", >>>> which=GRanges("seq1", IRanges(100, 500))) >>>> dna = scanBam(fl, param=param)[[1]][["seq"]] >>>> length(dna) # 365 reads overlap region >>>> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC >>>> >>>> though you'd likely want to specify several regions (vector arguments >>>> to GRanges) and think about flags (scanBamFlag() and the flag argument >>>> to ScanBamParam), read mapping quality, reads overlapping more than >>>> one region, etc. (summarizeOverlaps implements several counting >>>> strategies, but it is 'easy' to implement arbitrary approaches). >>>> >>>>> >>>>> Martin >>>>> >>>>>> >>>>>> Thanks for any input! >>>>>> >>>>>> Best, >>>>>> Yu Chuan >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>> >>>> >>>> -- >>>> Computational Biology >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: M1-B861 >>>> Telephone: 206 667-2793 >>>> >> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:5}}
ADD REPLY
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Wed, May 30, 2012 at 1:17 AM, Yu Chuan Tai <yuchuan@stat.berkeley.edu>wrote: > Hi, > > I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon > sequencing, is there any package/function which can extract amplicon-level > read counts and GC content from an aligned file of BAM format? The same > question to exon-level read counts and GC content. More generally, given a > genomic interval, how could I extract the read count and GC content for > that interval? > > I wrote a small tutorial that has some of these details in it: http://watson.nci.nih.gov/~sdavis/tutorials/ See the "Accomplishing Exome Sequencing Data Tasks with Bioconductor" tutorial. Sean [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Sean, Thanks a lot for your info. I will have a look at your tutorials and try it and let you know if I need any help. Best, Yu Chuan On Wed, 30 May 2012, Sean Davis wrote: > > > On Wed, May 30, 2012 at 1:17 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: > Hi, > > I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon sequencing, is there any > package/function which can extract amplicon-level read counts and GC content ?from an aligned file of BAM > format? The same question to exon-level read counts and GC content. More generally, given a genomic > interval, how could I extract the read count and GC content for that interval? > > > I wrote a small tutorial that has some of these details in it: > > http://watson.nci.nih.gov/~sdavis/tutorials/ > > See the "Accomplishing Exome Sequencing Data Tasks with Bioconductor" tutorial. > > Sean > ? > >
ADD REPLY
0
Entering edit mode
love ▴ 150
@love-5173
Last seen 10.2 years ago
hi Yu Chuan, I wrote two convenience functions, countBamInGRanges and getGCcontent, for the exomeCopy package for these tasks. You can take a look at the first part of th e vignette here: [1]http://www.bioconductor.org/packages/release/bioc/html/exomeCopy.ht ml best, Mike Date: Tue, 29 May 2012 22:17:29 -0700 (PDT) From: Yu Chuan Tai [2]<yuchuan@stat.berkeley.edu> To: [3]bioconductor at r-project.org Subject: [BioC] Amplicon and exon level read counts and GC content Message-ID: [4]<alpine.deb.2.00.1205292216070.8774 at="" arwen.berkeley.edu=""> Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Hi, I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon sequencing, is there any package/function which can extract amplicon- level read counts and GC content from an aligned file of BAM format? The same question to exon-level read counts and GC content. More generally, given a genomic interval, how could I extract the read count and GC content for that interval? Thanks for any input! Best, Yu Chuan References 1. http://www.bioconductor.org/packages/release/bioc/html/exomeCopy.html 2. mailto:yuchuan at stat.berkeley.edu 3. mailto:bioconductor at r-project.org 4. mailto:alpine.DEB.2.00.1205292216070.8774 at arwen.Berkeley.EDU
ADD COMMENT
0
Entering edit mode
Hi Michael, Thanks for the info....looks like these functions are what I need. I will check your package out and let you know if I have any question. Best, Yu Chuan On Wed, 30 May 2012, Michael Love wrote: > > hi Yu Chuan, > > I wrote two convenience functions, countBamInGRanges and getGCcontent, for the exomeCopy package for these tasks. You c > an take a look at the first part of the vignette here: > > http://www.bioconductor.org/packages/release/bioc/html/exomeCopy.html > > best, > > Mike > > Date: Tue, 29 May 2012 22:17:29 -0700 (PDT) > From: Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> > To: bioconductor at r-project.org > Subject: [BioC] Amplicon and exon level read counts and GC content > Message-ID: <alpine.deb.2.00.1205292216070.8774 at="" arwen.berkeley.edu=""> > Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII > > Hi, > > I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon > sequencing, is there any package/function which can extract amplicon-level > read counts and GC content from an aligned file of BAM format? The same > question to exon-level read counts and GC content. More generally, given a > genomic interval, how could I extract the read count and GC content for > that interval? > > Thanks for any input! > > Best, > Yu Chuan > > >
ADD REPLY
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 3 months ago
Australia/Melbourne/Olivia Newton-John …
Dear Yu Chuan, You may consider using the featureCounts function in Rsubread package to count the number of reads for the features you are interested in. The built-in RefSeq annotation will allow you to get exon-level read counts. But to get amplicon-level read counts or read counts in a specific genomic interval, you'll have to provide an annotation file to this function to let it count reads for you. This function accepts SAM files as input, but I guess you can convert your BAM files to SAM files. For you information about this function, see ?featureCounts. Hope this helps. Cheers, Wei On May 30, 2012, at 3:17 PM, Yu Chuan Tai wrote: > Hi, > > I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon sequencing, is there any package/function which can extract amplicon-level read counts and GC content from an aligned file of BAM format? The same question to exon-level read counts and GC content. More generally, given a genomic interval, how could I extract the read count and GC content for that interval? > > Thanks for any input! > > Best, > Yu Chuan > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT
0
Entering edit mode
Hi Wei, Thanks a lot for your quick reply. I will look into featureCounts in Rsubread. Best, Yu Chuan On Wed, 30 May 2012, Wei Shi wrote: > Dear Yu Chuan, > > You may consider using the featureCounts function in Rsubread package to count the number of reads for the features you are interested in. The built-in RefSeq annotation will allow you to get exon-level read counts. But to get amplicon-level read counts or read counts in a specific genomic interval, you'll have to provide an annotation file to this function to let it count reads for you. > > This function accepts SAM files as input, but I guess you can convert your BAM files to SAM files. For you information about this function, see ?featureCounts. > > Hope this helps. > > Cheers, > Wei > > On May 30, 2012, at 3:17 PM, Yu Chuan Tai wrote: > >> Hi, >> >> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon sequencing, is there any package/function which can extract amplicon-level read counts and GC content from an aligned file of BAM format? The same question to exon-level read counts and GC content. More generally, given a genomic interval, how could I extract the read count and GC content for that interval? >> >> Thanks for any input! >> >> Best, >> Yu Chuan >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:5}}
ADD REPLY

Login before adding your answer.

Traffic: 843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6