yieldSize and paired end data in SummarizeOverlaps error?

0

Entering edit mode

Thomas Whisenant ▴ 10

@thomas-whisenant-5825

Last seen 9.6 years ago

Hello all, I am attempting to use summarizeOverlaps to assign counts to exons using paired end Tophat alignments. As an example, I have created a bamfilelist containing one bamfile and a feature list of exons by gene. This is followed by using summarizeOverlaps with the single end parameter set to false. bfl <- BamFileList("EP04.bam", index="EP04.bam", yieldSize=1000000) exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene") all.exons <- unlist(exons.by.gene) sumexp <- summarizeOverlaps( features=all.exons, reads=bfl, mode=IntersectionNotEmpty, singleEnd=FALSE, ignore.strand=TRUE, parallel=TRUE) Error: cannot allocate vector of size 277.9 Mb In addition: Warning message: In readBamGappedAlignmentPairs(bf, param = param) : 'yieldSize' set to 'NA' Even with the parallel option, this process uses all available memory which presumably leads to the vector allocation error. While I have set the yieldSize in the BamFileList, this parameter does not appear to be used. Running summarizeOverlaps with "singleEnd=TRUE" does not issue a yieldSize warning. Any ideas on how to get set the yieldSize with "singleEnd=FALSE"? Thanks, Tom R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DEXSeq_1.4.0 [2] Rsamtools_1.10.2 [3] org.Hs.eg.db_2.8.0 [4] RSQLite_0.11.2 [5] DBI_0.2-5 [6] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 [7] GenomicFeatures_1.10.1 [8] BSgenome.Hsapiens.UCSC.hg19_1.3.19 [9] BSgenome_1.26.1 [10] Biostrings_2.26.3 [11] GenomicRanges_1.10.6 [12] IRanges_1.16.5 [13] annotate_1.36.0 [14] AnnotationDbi_1.20.3 [15] Biobase_2.18.0 [16] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3 [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17 [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2 [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0 Thomas Whisenant, Ph.D. Salomon Lab, MEM-241 Department of Molecular and Experimental Medicine The Scripps Research Institute 10550 North Torrey Pines Rd. La Jolla, CA 92037 thomasw@scripps.edu<mailto:thomasw@scripps.edu> [[alternative HTML version deleted]]

BSgenome PROcess BSgenome ASSIGN BSgenome PROcess BSgenome ASSIGN • 2.1k views

ADD COMMENT • link updated 11.1 years ago by Valerie Obenchain ★ 6.8k • written 11.1 years ago by Thomas Whisenant ▴ 10

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.3 years ago

United States

Hi Tom, Currently readBamGappedAlignmentPairs() reads all data from a Bam file into memory to sort and determine proper pairs. This is a problem for large files and is the reason for the error you see below. We've been working on alternative approach that involves first sorting the Bam file by qname. Once sorted, the file can be read in chunks specified by 'yieldSize' in the BamFile object. This functionality is available in GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work in progress and I have not yet implemented a check to ensure the file is sorted by qname. If you try this out, please be sure to sort your Bam file by qname first. Examples of this approach are on the summarizeOverlaps man page for the BamFile method, ?'summarizeOverlaps,GRanges,BamFileList-method' Here is a piece of the example: library(pasillaBamSubset) library("TxDb.Dmelanogaster.UCSC.dm3.ensGene") exbygene <- exonsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") ## paired-end sorted by qname: ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname ## can be read in chunks with 'yieldSize'. fl <- untreated3_chr4() sortfl <- sortBam(fl, tempfile(), byQname=TRUE) bf2 <- BamFileList(sortfl, index=character(0), yieldSize=50000, obeyQname=TRUE) se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE) counts2 <- assays(se2)$counts ## paired-end not sorted: ## If the file is not sorted by qname, all records are read ## into memory for sorting and to determine proper pairs. ## Any 'yieldSize' set on the BamFile will be ignored. fl <- untreated3_chr4() bf3 <- BamFileList(fl) se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE) counts3 <- assays(se3)$counts Another recent feature addition is the GALignmentsList class. The goal here is to provide a container that holds any type of read, single- end, paired-end, singletons, multiple fragments etc. Again, the methods are based on the user providing a Bam file sorted by qname. Once read in, the records are split on read id (QNAME in SAM/BAM). The associated read functions are in GenomicRanges and Rsamtools. ?readGAlignmentsList (GenomicRanges) ?readBamGAlignmentsList (Rsamtools) Let me know if you run into problems. Valerie On 03/13/2013 11:07 AM, Thomas Whisenant wrote: > Hello all, > I am attempting to use summarizeOverlaps to assign counts to exons using paired end Tophat alignments. As an example, I have created a bamfilelist containing one bamfile and a feature list of exons by gene. This is followed by using summarizeOverlaps with the single end parameter set to false. > > bfl <- BamFileList("EP04.bam", index="EP04.bam", yieldSize=1000000) > exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene") > all.exons <- unlist(exons.by.gene) > sumexp <- summarizeOverlaps( > features=all.exons, reads=bfl, > mode=IntersectionNotEmpty, > singleEnd=FALSE, > ignore.strand=TRUE, parallel=TRUE) > Error: cannot allocate vector of size 277.9 Mb > In addition: Warning message: > In readBamGappedAlignmentPairs(bf, param = param) : 'yieldSize' set to 'NA' > > Even with the parallel option, this process uses all available memory which presumably leads to the vector allocation error. While I have set the yieldSize in the BamFileList, this parameter does not appear to be used. Running summarizeOverlaps with "singleEnd=TRUE" does not issue a yieldSize warning. Any ideas on how to get set the yieldSize with "singleEnd=FALSE"? > Thanks, > Tom > > > R version 2.15.2 (2012-10-26) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > locale: > > [1] LC_COLLATE=English_United States.1252 > > [2] LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] DEXSeq_1.4.0 > > [2] Rsamtools_1.10.2 > > [3] org.Hs.eg.db_2.8.0 > > [4] RSQLite_0.11.2 > > [5] DBI_0.2-5 > > [6] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 > > [7] GenomicFeatures_1.10.1 > > [8] BSgenome.Hsapiens.UCSC.hg19_1.3.19 > > [9] BSgenome_1.26.1 > > [10] Biostrings_2.26.3 > > [11] GenomicRanges_1.10.6 > > [12] IRanges_1.16.5 > > [13] annotate_1.36.0 > > [14] AnnotationDbi_1.20.3 > > [15] Biobase_2.18.0 > > [16] BiocGenerics_0.4.0 > > > > loaded via a namespace (and not attached): > > [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3 > > [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17 > > [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2 > > [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0 > > > > Thomas Whisenant, Ph.D. > Salomon Lab, MEM-241 > Department of Molecular and Experimental Medicine > The Scripps Research Institute > 10550 North Torrey Pines Rd. > La Jolla, CA 92037 > thomasw at scripps.edu<mailto:thomasw at="" scripps.edu=""> > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 11.1 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

This sounds interesting. Just want to make sure: does this ensure that all alignments for a given QNAME fall within one chunk? What happens if the yieldsize is too small to achieve that? And of course the drawback here is that a QNAME-sorted BAM is not indexable. So in practice one would need two BAMs, one sorted by POS, the other by QNAME. What about this: preprocess a POS-sorted (and thus indexable) BAM and add a flag indicating whether an alignment is the last alignment for a given QNAME. The scanner then yields all complete QNAMEs after it has scanned 'yieldSize' such flags. I realize this is more work on your part, but maybe it saves the hassle of multiple BAMs? Just an idea. Michael On Thu, Mar 14, 2013 at 2:49 PM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > Hi Tom, > > Currently readBamGappedAlignmentPairs() reads all data from a Bam file > into memory to sort and determine proper pairs. This is a problem for large > files and is the reason for the error you see below. > > We've been working on alternative approach that involves first sorting the > Bam file by qname. Once sorted, the file can be read in chunks specified by > 'yieldSize' in the BamFile object. This functionality is available in > GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work in progress and I > have not yet implemented a check to ensure the file is sorted by qname. If > you try this out, please be sure to sort your Bam file by qname first. > > Examples of this approach are on the summarizeOverlaps man page for the > BamFile method, > > ?'summarizeOverlaps,GRanges,**BamFileList-method' > > Here is a piece of the example: > > library(pasillaBamSubset) > library("TxDb.Dmelanogaster.**UCSC.dm3.ensGene") > exbygene <- exonsBy(TxDb.Dmelanogaster.**UCSC.dm3.ensGene, "gene") > > ## paired-end sorted by qname: > ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname > ## can be read in chunks with 'yieldSize'. > fl <- untreated3_chr4() > sortfl <- sortBam(fl, tempfile(), byQname=TRUE) > bf2 <- BamFileList(sortfl, index=character(0), > yieldSize=50000, obeyQname=TRUE) > se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE) > counts2 <- assays(se2)$counts > > ## paired-end not sorted: > ## If the file is not sorted by qname, all records are read > ## into memory for sorting and to determine proper pairs. > ## Any 'yieldSize' set on the BamFile will be ignored. > fl <- untreated3_chr4() > bf3 <- BamFileList(fl) > se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE) > counts3 <- assays(se3)$counts > > > Another recent feature addition is the GALignmentsList class. The goal > here is to provide a container that holds any type of read, single- end, > paired-end, singletons, multiple fragments etc. Again, the methods are > based on the user providing a Bam file sorted by qname. Once read in, the > records are split on read id (QNAME in SAM/BAM). The associated read > functions are in GenomicRanges and Rsamtools. > > ?readGAlignmentsList (GenomicRanges) > ?readBamGAlignmentsList (Rsamtools) > > Let me know if you run into problems. > > Valerie > > > > > > > On 03/13/2013 11:07 AM, Thomas Whisenant wrote: > >> Hello all, >> I am attempting to use summarizeOverlaps to assign counts to exons using >> paired end Tophat alignments. As an example, I have created a bamfilelist >> containing one bamfile and a feature list of exons by gene. This is >> followed by using summarizeOverlaps with the single end parameter set to >> false. >> >> bfl <- BamFileList("EP04.bam", index="EP04.bam", yieldSize=1000000) >> exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.**hg19.knownGene, "gene") >> all.exons <- unlist(exons.by.gene) >> sumexp <- summarizeOverlaps( >> features=all.exons, reads=bfl, >> mode=IntersectionNotEmpty, >> singleEnd=FALSE, >> ignore.strand=TRUE, parallel=TRUE) >> Error: cannot allocate vector of size 277.9 Mb >> In addition: Warning message: >> In readBamGappedAlignmentPairs(**bf, param = param) : 'yieldSize' set to >> 'NA' >> >> Even with the parallel option, this process uses all available memory >> which presumably leads to the vector allocation error. While I have set the >> yieldSize in the BamFileList, this parameter does not appear to be used. >> Running summarizeOverlaps with "singleEnd=TRUE" does not issue a yieldSize >> warning. Any ideas on how to get set the yieldSize with "singleEnd=FALSE"? >> Thanks, >> Tom >> >> >> R version 2.15.2 (2012-10-26) >> >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> >> >> locale: >> >> [1] LC_COLLATE=English_United States.1252 >> >> [2] LC_CTYPE=English_United States.1252 >> >> [3] LC_MONETARY=English_United States.1252 >> >> [4] LC_NUMERIC=C >> >> [5] LC_TIME=English_United States.1252 >> >> >> >> attached base packages: >> >> [1] parallel stats graphics grDevices utils datasets methods >> >> [8] base >> >> >> >> other attached packages: >> >> [1] DEXSeq_1.4.0 >> >> [2] Rsamtools_1.10.2 >> >> [3] org.Hs.eg.db_2.8.0 >> >> [4] RSQLite_0.11.2 >> >> [5] DBI_0.2-5 >> >> [6] TxDb.Hsapiens.UCSC.hg19.**knownGene_2.8.0 >> >> [7] GenomicFeatures_1.10.1 >> >> [8] BSgenome.Hsapiens.UCSC.hg19_1.**3.19 >> >> [9] BSgenome_1.26.1 >> >> [10] Biostrings_2.26.3 >> >> [11] GenomicRanges_1.10.6 >> >> [12] IRanges_1.16.5 >> >> [13] annotate_1.36.0 >> >> [14] AnnotationDbi_1.20.3 >> >> [15] Biobase_2.18.0 >> >> [16] BiocGenerics_0.4.0 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3 >> >> [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17 >> >> [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2 >> >> [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0 >> >> >> >> Thomas Whisenant, Ph.D. >> Salomon Lab, MEM-241 >> Department of Molecular and Experimental Medicine >> The Scripps Research Institute >> 10550 North Torrey Pines Rd. >> La Jolla, CA 92037 >> thomasw@scripps.edu<mailto:tho**masw@scripps.edu <thomasw@scripps.edu="">> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago Michael Lawrence ★ 11k

0

Entering edit mode

A while back I was struggling with how I could process large paired bam files in managable chunks. I wrote a Python script called to take a BAM file and splits in into non-overlapping "loci", or sets of read and read pairs that don't overlap reads in other loci. The result is a bunch of small bam files, one per locus. https://raw.github.com/DarwinAwardWinner/splitloci/master/splitloci.py The important point is that read pairs only need to be matched with other reads in the same locus, which allows reading paired alignments from a position-sorted BAM file without having to read the whole file at once. The disadvantage of this approach is that the minimum required memory is dependent on the size of the largest locus in the file, and cannot be controlled easily. However the memory usage is almost certain to be significantly less than reading the entire file at once. If you want to, you could adapt the logic for reading alignment pairs. Of course, the concept of "loci" breaks down in the presence of fusion reads, and I'm not certain how it works with spliced reads from Tophat, since I believe the splice is encoded into the CIGAR string, which might throw off the calculation of the end position. -Ryan On 03/14/2013 03:30 PM, Michael Lawrence wrote: > This sounds interesting. Just want to make sure: does this ensure that all > alignments for a given QNAME fall within one chunk? What happens if the > yieldsize is too small to achieve that? > > And of course the drawback here is that a QNAME-sorted BAM is not > indexable. So in practice one would need two BAMs, one sorted by POS, the > other by QNAME. > > What about this: preprocess a POS-sorted (and thus indexable) BAM and add a > flag indicating whether an alignment is the last alignment for a given > QNAME. The scanner then yields all complete QNAMEs after it has scanned > 'yieldSize' such flags. I realize this is more work on your part, but maybe > it saves the hassle of multiple BAMs? Just an idea. > > Michael > > > > On Thu, Mar 14, 2013 at 2:49 PM, Valerie Obenchain <vobencha at="" fhcrc.org="">wrote: > >> Hi Tom, >> >> Currently readBamGappedAlignmentPairs() reads all data from a Bam file >> into memory to sort and determine proper pairs. This is a problem for large >> files and is the reason for the error you see below. >> >> We've been working on alternative approach that involves first sorting the >> Bam file by qname. Once sorted, the file can be read in chunks specified by >> 'yieldSize' in the BamFile object. This functionality is available in >> GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work in progress and I >> have not yet implemented a check to ensure the file is sorted by qname. If >> you try this out, please be sure to sort your Bam file by qname first. >> >> Examples of this approach are on the summarizeOverlaps man page for the >> BamFile method, >> >> ?'summarizeOverlaps,GRanges,**BamFileList-method' >> >> Here is a piece of the example: >> >> library(pasillaBamSubset) >> library("TxDb.Dmelanogaster.**UCSC.dm3.ensGene") >> exbygene <- exonsBy(TxDb.Dmelanogaster.**UCSC.dm3.ensGene, "gene") >> >> ## paired-end sorted by qname: >> ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname >> ## can be read in chunks with 'yieldSize'. >> fl <- untreated3_chr4() >> sortfl <- sortBam(fl, tempfile(), byQname=TRUE) >> bf2 <- BamFileList(sortfl, index=character(0), >> yieldSize=50000, obeyQname=TRUE) >> se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE) >> counts2 <- assays(se2)$counts >> >> ## paired-end not sorted: >> ## If the file is not sorted by qname, all records are read >> ## into memory for sorting and to determine proper pairs. >> ## Any 'yieldSize' set on the BamFile will be ignored. >> fl <- untreated3_chr4() >> bf3 <- BamFileList(fl) >> se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE) >> counts3 <- assays(se3)$counts >> >> >> Another recent feature addition is the GALignmentsList class. The goal >> here is to provide a container that holds any type of read, single- end, >> paired-end, singletons, multiple fragments etc. Again, the methods are >> based on the user providing a Bam file sorted by qname. Once read in, the >> records are split on read id (QNAME in SAM/BAM). The associated read >> functions are in GenomicRanges and Rsamtools. >> >> ?readGAlignmentsList (GenomicRanges) >> ?readBamGAlignmentsList (Rsamtools) >> >> Let me know if you run into problems. >> >> Valerie >> >> >> >> >> >> >> On 03/13/2013 11:07 AM, Thomas Whisenant wrote: >> >>> Hello all, >>> I am attempting to use summarizeOverlaps to assign counts to exons using >>> paired end Tophat alignments. As an example, I have created a bamfilelist >>> containing one bamfile and a feature list of exons by gene. This is >>> followed by using summarizeOverlaps with the single end parameter set to >>> false. >>> >>> bfl <- BamFileList("EP04.bam", index="EP04.bam", yieldSize=1000000) >>> exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.**hg19.knownGene, "gene") >>> all.exons <- unlist(exons.by.gene) >>> sumexp <- summarizeOverlaps( >>> features=all.exons, reads=bfl, >>> mode=IntersectionNotEmpty, >>> singleEnd=FALSE, >>> ignore.strand=TRUE, parallel=TRUE) >>> Error: cannot allocate vector of size 277.9 Mb >>> In addition: Warning message: >>> In readBamGappedAlignmentPairs(**bf, param = param) : 'yieldSize' set to >>> 'NA' >>> >>> Even with the parallel option, this process uses all available memory >>> which presumably leads to the vector allocation error. While I have set the >>> yieldSize in the BamFileList, this parameter does not appear to be used. >>> Running summarizeOverlaps with "singleEnd=TRUE" does not issue a yieldSize >>> warning. Any ideas on how to get set the yieldSize with "singleEnd=FALSE"? >>> Thanks, >>> Tom >>> >>> >>> R version 2.15.2 (2012-10-26) >>> >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> >>> >>> locale: >>> >>> [1] LC_COLLATE=English_United States.1252 >>> >>> [2] LC_CTYPE=English_United States.1252 >>> >>> [3] LC_MONETARY=English_United States.1252 >>> >>> [4] LC_NUMERIC=C >>> >>> [5] LC_TIME=English_United States.1252 >>> >>> >>> >>> attached base packages: >>> >>> [1] parallel stats graphics grDevices utils datasets methods >>> >>> [8] base >>> >>> >>> >>> other attached packages: >>> >>> [1] DEXSeq_1.4.0 >>> >>> [2] Rsamtools_1.10.2 >>> >>> [3] org.Hs.eg.db_2.8.0 >>> >>> [4] RSQLite_0.11.2 >>> >>> [5] DBI_0.2-5 >>> >>> [6] TxDb.Hsapiens.UCSC.hg19.**knownGene_2.8.0 >>> >>> [7] GenomicFeatures_1.10.1 >>> >>> [8] BSgenome.Hsapiens.UCSC.hg19_1.**3.19 >>> >>> [9] BSgenome_1.26.1 >>> >>> [10] Biostrings_2.26.3 >>> >>> [11] GenomicRanges_1.10.6 >>> >>> [12] IRanges_1.16.5 >>> >>> [13] annotate_1.36.0 >>> >>> [14] AnnotationDbi_1.20.3 >>> >>> [15] Biobase_2.18.0 >>> >>> [16] BiocGenerics_0.4.0 >>> >>> >>> >>> loaded via a namespace (and not attached): >>> >>> [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3 >>> >>> [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17 >>> >>> [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2 >>> >>> [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0 >>> >>> >>> >>> Thomas Whisenant, Ph.D. >>> Salomon Lab, MEM-241 >>> Department of Molecular and Experimental Medicine >>> The Scripps Research Institute >>> 10550 North Torrey Pines Rd. >>> La Jolla, CA 92037 >>> thomasw at scripps.edu<mailto:tho**masw at="" scripps.edu="" <thomasw="" at="" scripps.edu="">> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >>> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Hi Michael and Ryan, Yes, all QNAME records fall within one chunk. When yieldSize is too small, scanBam() will return more than the yieldSize up to the break/difference in QNAME. library(pasillaBamSubset) fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE) bf <- BamFile(fl, index=character(0), yieldSize=3, obeyQname=TRUE) scn <- scanBam(bf, obeyQname=TRUE) > scn[[1]][c("qname", "seq")] $qname [1] "SRR031714.65" "SRR031714.65" "SRR031714.185" "SRR031714.185" $seq A DNAStringSet instance of length 4 width seq [1] 37 CTGAACCGTCGAAGTATTAGCTTGCGGAACAGATCCA [2] 37 GTTTGGGCTCGAGGGATTGGGCGATTGAGTGGCTATT [3] 37 GGTGGATACTCGGTTTTCGCGGGAGTTGGTGAACGCA [4] 37 AGGTGCTCGTGCCCGTGTTGCTCTAACTGGACTGACC Combining this with position indexing is the next step. Interesting idea to create a 'last QNAME' flag. I'll look into that and also the 'non-overlapping loci' approach. Thanks Ryan for sending the code link. Valerie On 03/14/2013 03:57 PM, Ryan C. Thompson wrote: > A while back I was struggling with how I could process large paired bam > files in managable chunks. I wrote a Python script called to take a BAM > file and splits in into non-overlapping "loci", or sets of read and read > pairs that don't overlap reads in other loci. The result is a bunch of > small bam files, one per locus. > > https://raw.github.com/DarwinAwardWinner/splitloci/master/splitloci.py > > The important point is that read pairs only need to be matched with > other reads in the same locus, which allows reading paired alignments > from a position-sorted BAM file without having to read the whole file at > once. The disadvantage of this approach is that the minimum required > memory is dependent on the size of the largest locus in the file, and > cannot be controlled easily. However the memory usage is almost certain > to be significantly less than reading the entire file at once. > > If you want to, you could adapt the logic for reading alignment pairs. > > Of course, the concept of "loci" breaks down in the presence of fusion > reads, and I'm not certain how it works with spliced reads from Tophat, > since I believe the splice is encoded into the CIGAR string, which might > throw off the calculation of the end position. > > -Ryan > > On 03/14/2013 03:30 PM, Michael Lawrence wrote: >> This sounds interesting. Just want to make sure: does this ensure that >> all >> alignments for a given QNAME fall within one chunk? What happens if the >> yieldsize is too small to achieve that? >> >> And of course the drawback here is that a QNAME-sorted BAM is not >> indexable. So in practice one would need two BAMs, one sorted by POS, the >> other by QNAME. >> >> What about this: preprocess a POS-sorted (and thus indexable) BAM and >> add a >> flag indicating whether an alignment is the last alignment for a given >> QNAME. The scanner then yields all complete QNAMEs after it has scanned >> 'yieldSize' such flags. I realize this is more work on your part, but >> maybe >> it saves the hassle of multiple BAMs? Just an idea. >> >> Michael >> >> >> >> On Thu, Mar 14, 2013 at 2:49 PM, Valerie Obenchain >> <vobencha at="" fhcrc.org="">wrote: >> >>> Hi Tom, >>> >>> Currently readBamGappedAlignmentPairs() reads all data from a Bam file >>> into memory to sort and determine proper pairs. This is a problem for >>> large >>> files and is the reason for the error you see below. >>> >>> We've been working on alternative approach that involves first >>> sorting the >>> Bam file by qname. Once sorted, the file can be read in chunks >>> specified by >>> 'yieldSize' in the BamFile object. This functionality is available in >>> GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work in progress >>> and I >>> have not yet implemented a check to ensure the file is sorted by >>> qname. If >>> you try this out, please be sure to sort your Bam file by qname first. >>> >>> Examples of this approach are on the summarizeOverlaps man page for the >>> BamFile method, >>> >>> ?'summarizeOverlaps,GRanges,**BamFileList-method' >>> >>> Here is a piece of the example: >>> >>> library(pasillaBamSubset) >>> library("TxDb.Dmelanogaster.**UCSC.dm3.ensGene") >>> exbygene <- exonsBy(TxDb.Dmelanogaster.**UCSC.dm3.ensGene, "gene") >>> >>> ## paired-end sorted by qname: >>> ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname >>> ## can be read in chunks with 'yieldSize'. >>> fl <- untreated3_chr4() >>> sortfl <- sortBam(fl, tempfile(), byQname=TRUE) >>> bf2 <- BamFileList(sortfl, index=character(0), >>> yieldSize=50000, obeyQname=TRUE) >>> se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE) >>> counts2 <- assays(se2)$counts >>> >>> ## paired-end not sorted: >>> ## If the file is not sorted by qname, all records are read >>> ## into memory for sorting and to determine proper pairs. >>> ## Any 'yieldSize' set on the BamFile will be ignored. >>> fl <- untreated3_chr4() >>> bf3 <- BamFileList(fl) >>> se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE) >>> counts3 <- assays(se3)$counts >>> >>> >>> Another recent feature addition is the GALignmentsList class. The goal >>> here is to provide a container that holds any type of read, single-end, >>> paired-end, singletons, multiple fragments etc. Again, the methods are >>> based on the user providing a Bam file sorted by qname. Once read in, >>> the >>> records are split on read id (QNAME in SAM/BAM). The associated read >>> functions are in GenomicRanges and Rsamtools. >>> >>> ?readGAlignmentsList (GenomicRanges) >>> ?readBamGAlignmentsList (Rsamtools) >>> >>> Let me know if you run into problems. >>> >>> Valerie >>> >>> >>> >>> >>> >>> >>> On 03/13/2013 11:07 AM, Thomas Whisenant wrote: >>> >>>> Hello all, >>>> I am attempting to use summarizeOverlaps to assign counts to exons >>>> using >>>> paired end Tophat alignments. As an example, I have created a >>>> bamfilelist >>>> containing one bamfile and a feature list of exons by gene. This is >>>> followed by using summarizeOverlaps with the single end parameter >>>> set to >>>> false. >>>> >>>> bfl <- BamFileList("EP04.bam", index="EP04.bam", yieldSize=1000000) >>>> exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.**hg19.knownGene, "gene") >>>> all.exons <- unlist(exons.by.gene) >>>> sumexp <- summarizeOverlaps( >>>> features=all.exons, reads=bfl, >>>> mode=IntersectionNotEmpty, >>>> singleEnd=FALSE, >>>> ignore.strand=TRUE, parallel=TRUE) >>>> Error: cannot allocate vector of size 277.9 Mb >>>> In addition: Warning message: >>>> In readBamGappedAlignmentPairs(**bf, param = param) : 'yieldSize' >>>> set to >>>> 'NA' >>>> >>>> Even with the parallel option, this process uses all available memory >>>> which presumably leads to the vector allocation error. While I have >>>> set the >>>> yieldSize in the BamFileList, this parameter does not appear to be >>>> used. >>>> Running summarizeOverlaps with "singleEnd=TRUE" does not issue a >>>> yieldSize >>>> warning. Any ideas on how to get set the yieldSize with >>>> "singleEnd=FALSE"? >>>> Thanks, >>>> Tom >>>> >>>> >>>> R version 2.15.2 (2012-10-26) >>>> >>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>> >>>> >>>> >>>> locale: >>>> >>>> [1] LC_COLLATE=English_United States.1252 >>>> >>>> [2] LC_CTYPE=English_United States.1252 >>>> >>>> [3] LC_MONETARY=English_United States.1252 >>>> >>>> [4] LC_NUMERIC=C >>>> >>>> [5] LC_TIME=English_United States.1252 >>>> >>>> >>>> >>>> attached base packages: >>>> >>>> [1] parallel stats graphics grDevices utils datasets methods >>>> >>>> [8] base >>>> >>>> >>>> >>>> other attached packages: >>>> >>>> [1] DEXSeq_1.4.0 >>>> >>>> [2] Rsamtools_1.10.2 >>>> >>>> [3] org.Hs.eg.db_2.8.0 >>>> >>>> [4] RSQLite_0.11.2 >>>> >>>> [5] DBI_0.2-5 >>>> >>>> [6] TxDb.Hsapiens.UCSC.hg19.**knownGene_2.8.0 >>>> >>>> [7] GenomicFeatures_1.10.1 >>>> >>>> [8] BSgenome.Hsapiens.UCSC.hg19_1.**3.19 >>>> >>>> [9] BSgenome_1.26.1 >>>> >>>> [10] Biostrings_2.26.3 >>>> >>>> [11] GenomicRanges_1.10.6 >>>> >>>> [12] IRanges_1.16.5 >>>> >>>> [13] annotate_1.36.0 >>>> >>>> [14] AnnotationDbi_1.20.3 >>>> >>>> [15] Biobase_2.18.0 >>>> >>>> [16] BiocGenerics_0.4.0 >>>> >>>> >>>> >>>> loaded via a namespace (and not attached): >>>> >>>> [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3 >>>> >>>> [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17 >>>> >>>> [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2 >>>> >>>> [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0 >>>> >>>> >>>> >>>> Thomas Whisenant, Ph.D. >>>> Salomon Lab, MEM-241 >>>> Department of Molecular and Experimental Medicine >>>> The Scripps Research Institute >>>> 10550 North Torrey Pines Rd. >>>> La Jolla, CA 92037 >>>> thomasw at scripps.edu<mailto:tho**masw at="" scripps.edu="" <thomasw="" at="" scripps.edu="">> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> >>>> Search the archives: http://news.gmane.org/gmane.** >>>> science.biology.informatics.**conductor<http: news.gmane.org="" gma="" ne.science.biology.informatics.conductor=""> >>>> >>>> >>>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 11.1 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

No problem. Feel free to contact me if you have any questions about the code. On Fri 15 Mar 2013 09:43:40 AM PDT, Valerie Obenchain wrote: > Hi Michael and Ryan, > > Yes, all QNAME records fall within one chunk. When yieldSize is too > small, scanBam() will return more than the yieldSize up to the > break/difference in QNAME. > > library(pasillaBamSubset) > fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE) > bf <- BamFile(fl, index=character(0), yieldSize=3, obeyQname=TRUE) > scn <- scanBam(bf, obeyQname=TRUE) > > scn[[1]][c("qname", "seq")] > $qname > [1] "SRR031714.65" "SRR031714.65" "SRR031714.185" "SRR031714.185" > > $seq > A DNAStringSet instance of length 4 > width seq > [1] 37 CTGAACCGTCGAAGTATTAGCTTGCGGAACAGATCCA > [2] 37 GTTTGGGCTCGAGGGATTGGGCGATTGAGTGGCTATT > [3] 37 GGTGGATACTCGGTTTTCGCGGGAGTTGGTGAACGCA > [4] 37 AGGTGCTCGTGCCCGTGTTGCTCTAACTGGACTGACC > > > Combining this with position indexing is the next step. Interesting > idea to create a 'last QNAME' flag. I'll look into that and also the > 'non-overlapping loci' approach. Thanks Ryan for sending the code link. > > > Valerie > > On 03/14/2013 03:57 PM, Ryan C. Thompson wrote: >> A while back I was struggling with how I could process large paired bam >> files in managable chunks. I wrote a Python script called to take a BAM >> file and splits in into non-overlapping "loci", or sets of read and read >> pairs that don't overlap reads in other loci. The result is a bunch of >> small bam files, one per locus. >> >> https://raw.github.com/DarwinAwardWinner/splitloci/master/splitloci.py >> >> The important point is that read pairs only need to be matched with >> other reads in the same locus, which allows reading paired alignments >> from a position-sorted BAM file without having to read the whole file at >> once. The disadvantage of this approach is that the minimum required >> memory is dependent on the size of the largest locus in the file, and >> cannot be controlled easily. However the memory usage is almost certain >> to be significantly less than reading the entire file at once. >> >> If you want to, you could adapt the logic for reading alignment pairs. >> >> Of course, the concept of "loci" breaks down in the presence of fusion >> reads, and I'm not certain how it works with spliced reads from Tophat, >> since I believe the splice is encoded into the CIGAR string, which might >> throw off the calculation of the end position. >> >> -Ryan >> >> On 03/14/2013 03:30 PM, Michael Lawrence wrote: >>> This sounds interesting. Just want to make sure: does this ensure that >>> all >>> alignments for a given QNAME fall within one chunk? What happens if the >>> yieldsize is too small to achieve that? >>> >>> And of course the drawback here is that a QNAME-sorted BAM is not >>> indexable. So in practice one would need two BAMs, one sorted by >>> POS, the >>> other by QNAME. >>> >>> What about this: preprocess a POS-sorted (and thus indexable) BAM and >>> add a >>> flag indicating whether an alignment is the last alignment for a given >>> QNAME. The scanner then yields all complete QNAMEs after it has scanned >>> 'yieldSize' such flags. I realize this is more work on your part, but >>> maybe >>> it saves the hassle of multiple BAMs? Just an idea. >>> >>> Michael >>> >>> >>> >>> On Thu, Mar 14, 2013 at 2:49 PM, Valerie Obenchain >>> <vobencha at="" fhcrc.org="">wrote: >>> >>>> Hi Tom, >>>> >>>> Currently readBamGappedAlignmentPairs() reads all data from a Bam file >>>> into memory to sort and determine proper pairs. This is a problem for >>>> large >>>> files and is the reason for the error you see below. >>>> >>>> We've been working on alternative approach that involves first >>>> sorting the >>>> Bam file by qname. Once sorted, the file can be read in chunks >>>> specified by >>>> 'yieldSize' in the BamFile object. This functionality is available in >>>> GenomicRanges 1.11.38 and Rsamtools 1.11.25. It is a work in progress >>>> and I >>>> have not yet implemented a check to ensure the file is sorted by >>>> qname. If >>>> you try this out, please be sure to sort your Bam file by qname first. >>>> >>>> Examples of this approach are on the summarizeOverlaps man page for >>>> the >>>> BamFile method, >>>> >>>> ?'summarizeOverlaps,GRanges,**BamFileList-method' >>>> >>>> Here is a piece of the example: >>>> >>>> library(pasillaBamSubset) >>>> library("TxDb.Dmelanogaster.**UCSC.dm3.ensGene") >>>> exbygene <- exonsBy(TxDb.Dmelanogaster.**UCSC.dm3.ensGene, "gene") >>>> >>>> ## paired-end sorted by qname: >>>> ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname >>>> ## can be read in chunks with 'yieldSize'. >>>> fl <- untreated3_chr4() >>>> sortfl <- sortBam(fl, tempfile(), byQname=TRUE) >>>> bf2 <- BamFileList(sortfl, index=character(0), >>>> yieldSize=50000, obeyQname=TRUE) >>>> se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE) >>>> counts2 <- assays(se2)$counts >>>> >>>> ## paired-end not sorted: >>>> ## If the file is not sorted by qname, all records are read >>>> ## into memory for sorting and to determine proper pairs. >>>> ## Any 'yieldSize' set on the BamFile will be ignored. >>>> fl <- untreated3_chr4() >>>> bf3 <- BamFileList(fl) >>>> se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE) >>>> counts3 <- assays(se3)$counts >>>> >>>> >>>> Another recent feature addition is the GALignmentsList class. The goal >>>> here is to provide a container that holds any type of read, >>>> single-end, >>>> paired-end, singletons, multiple fragments etc. Again, the methods are >>>> based on the user providing a Bam file sorted by qname. Once read in, >>>> the >>>> records are split on read id (QNAME in SAM/BAM). The associated read >>>> functions are in GenomicRanges and Rsamtools. >>>> >>>> ?readGAlignmentsList (GenomicRanges) >>>> ?readBamGAlignmentsList (Rsamtools) >>>> >>>> Let me know if you run into problems. >>>> >>>> Valerie >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 03/13/2013 11:07 AM, Thomas Whisenant wrote: >>>> >>>>> Hello all, >>>>> I am attempting to use summarizeOverlaps to assign counts to exons >>>>> using >>>>> paired end Tophat alignments. As an example, I have created a >>>>> bamfilelist >>>>> containing one bamfile and a feature list of exons by gene. This is >>>>> followed by using summarizeOverlaps with the single end parameter >>>>> set to >>>>> false. >>>>> >>>>> bfl <- BamFileList("EP04.bam", index="EP04.bam", yieldSize=1000000) >>>>> exons.by.gene <- exonsBy(TxDb.Hsapiens.UCSC.**hg19.knownGene, "gene") >>>>> all.exons <- unlist(exons.by.gene) >>>>> sumexp <- summarizeOverlaps( >>>>> features=all.exons, reads=bfl, >>>>> mode=IntersectionNotEmpty, >>>>> singleEnd=FALSE, >>>>> ignore.strand=TRUE, parallel=TRUE) >>>>> Error: cannot allocate vector of size 277.9 Mb >>>>> In addition: Warning message: >>>>> In readBamGappedAlignmentPairs(**bf, param = param) : 'yieldSize' >>>>> set to >>>>> 'NA' >>>>> >>>>> Even with the parallel option, this process uses all available memory >>>>> which presumably leads to the vector allocation error. While I have >>>>> set the >>>>> yieldSize in the BamFileList, this parameter does not appear to be >>>>> used. >>>>> Running summarizeOverlaps with "singleEnd=TRUE" does not issue a >>>>> yieldSize >>>>> warning. Any ideas on how to get set the yieldSize with >>>>> "singleEnd=FALSE"? >>>>> Thanks, >>>>> Tom >>>>> >>>>> >>>>> R version 2.15.2 (2012-10-26) >>>>> >>>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>>> >>>>> >>>>> >>>>> locale: >>>>> >>>>> [1] LC_COLLATE=English_United States.1252 >>>>> >>>>> [2] LC_CTYPE=English_United States.1252 >>>>> >>>>> [3] LC_MONETARY=English_United States.1252 >>>>> >>>>> [4] LC_NUMERIC=C >>>>> >>>>> [5] LC_TIME=English_United States.1252 >>>>> >>>>> >>>>> >>>>> attached base packages: >>>>> >>>>> [1] parallel stats graphics grDevices utils datasets >>>>> methods >>>>> >>>>> [8] base >>>>> >>>>> >>>>> >>>>> other attached packages: >>>>> >>>>> [1] DEXSeq_1.4.0 >>>>> >>>>> [2] Rsamtools_1.10.2 >>>>> >>>>> [3] org.Hs.eg.db_2.8.0 >>>>> >>>>> [4] RSQLite_0.11.2 >>>>> >>>>> [5] DBI_0.2-5 >>>>> >>>>> [6] TxDb.Hsapiens.UCSC.hg19.**knownGene_2.8.0 >>>>> >>>>> [7] GenomicFeatures_1.10.1 >>>>> >>>>> [8] BSgenome.Hsapiens.UCSC.hg19_1.**3.19 >>>>> >>>>> [9] BSgenome_1.26.1 >>>>> >>>>> [10] Biostrings_2.26.3 >>>>> >>>>> [11] GenomicRanges_1.10.6 >>>>> >>>>> [12] IRanges_1.16.5 >>>>> >>>>> [13] annotate_1.36.0 >>>>> >>>>> [14] AnnotationDbi_1.20.3 >>>>> >>>>> [15] Biobase_2.18.0 >>>>> >>>>> [16] BiocGenerics_0.4.0 >>>>> >>>>> >>>>> >>>>> loaded via a namespace (and not attached): >>>>> >>>>> [1] biomaRt_2.14.0 bitops_1.0-5 hwriter_1.3 >>>>> >>>>> [4] RCurl_1.95-3 rtracklayer_1.18.2 statmod_1.4.17 >>>>> >>>>> [7] stats4_2.15.2 stringr_0.6.2 tools_2.15.2 >>>>> >>>>> [10] XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0 >>>>> >>>>> >>>>> >>>>> Thomas Whisenant, Ph.D. >>>>> Salomon Lab, MEM-241 >>>>> Department of Molecular and Experimental Medicine >>>>> The Scripps Research Institute >>>>> 10550 North Torrey Pines Rd. >>>>> La Jolla, CA 92037 >>>>> thomasw at scripps.edu<mailto:tho**masw at="" scripps.edu="">>>>> <thomasw at="" scripps.edu="">> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> >>>>> >>>>> Search the archives: http://news.gmane.org/gmane.** >>>>> science.biology.informatics.**conductor<http: news.gmane.org="" gm="" ane.science.biology.informatics.conductor=""> >>>>> >>>>> >>>>> >>>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> >>>> >>>> Search the archives: http://news.gmane.org/gmane.** >>>> science.biology.informatics.**conductor<http: news.gmane.org="" gma="" ne.science.biology.informatics.conductor=""> >>>> >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>

ADD REPLY • link 11.1 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.