endoapply over GRangesList fails when an empty GRanges object is returned (Bug?)
1
0
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States
Hi all, I think this is the wrong behavior .. an endoapply(GRangesList, function(...)) will fail if the embedded function returns and empty GRanges object as one of its operations. Calling GRangesList(lapply(GRangesList, function(...))) will work. So too would an endoapply over an IRangesList that does the same thing ... In the code below -- the result that's meant to be stored in "fail" is what I'm talking about. The other two code blocks is just to show you that it works if you do an end-around of the endoapply(GRangesList, ...). I think I would expect the result that comes back in "good" down below to be the same thing that "fail" should be. gr <- GRanges(seqnames='chr1', strand='+', ranges=IRanges(start=c(sample(18:22, 10, replace=TRUE), sample(100:108, 10, replace=TRUE)), width=sample(18:20, 20, replace=TRUE))) strand(gr)[2:5] <- '-' grl <- split(gr, strand(gr)) itake <- IRanges(start=100, end=150) gtake <- GRanges(seqnames='chr1', ranges=itake) ## This will fail, don't think it should fail <- endoapply(grl, function(x) { subsetByOverlaps(x, gtake) }) ## This works, and ultimately is what the previous one should be good <- GRangesList(lapply(grl, function(x) { take <- GRanges(seqnames='chr1', ranges=itake) subsetByOverlaps(x, gtake) })) ## Test IRangesList: this works. irl <- IRangesList(lapply(grl, ranges)) check <- endoapply(irl, function(x) { subsetByOverlaps(x, itake) }) Thanks, -steve R version 2.12.0 Under development (unstable) (2010-08-18 r52771) Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_2.4.1 AnnotationDbi_1.11.4 [3] Biobase_2.9.0 BSgenome.Hsapiens.UCSC.hg18_1.3.16 [5] ggplot2_0.8.8 proto_0.3-8 [7] reshape_0.8.3 doMC_1.2.1 [9] multicore_0.1-3 foreach_1.3.0 [11] codetools_0.2-2 iterators_1.0.3 [13] GenomeGraphs_1.9.0 biomaRt_2.5.1 [15] bitops_1.0-4.1 Rsamtools_1.1.12 [17] data.table_1.5 plyr_1.1 [19] BSgenome_1.17.6 Biostrings_2.17.28 [21] RSQLite_0.9-2 DBI_0.2-5 [23] GenomicFeatures_1.1.11 GenomicRanges_1.1.20 [25] IRanges_1.7.19 loaded via a namespace (and not attached): [1] RCurl_1.4-3 rtracklayer_1.9.7 tools_2.12.0 XML_3.1-1 -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Cancer Cancer • 2.0k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States
On Tue, Aug 24, 2010 at 2:21 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi all, > > I think this is the wrong behavior .. an endoapply(GRangesList, > function(...)) will fail if the embedded function returns and empty > GRanges object as one of its operations. > > Calling GRangesList(lapply(GRangesList, function(...))) will work. > So too would an endoapply over an IRangesList that does the same thing ... > > In the code below -- the result that's meant to be stored in "fail" is > what I'm talking about. The other two code blocks is just to show you > that it works if you do an end-around of the endoapply(GRangesList, > ...). > > I think I would expect the result that comes back in "good" down below > to be the same thing that "fail" should be. > > > gr <- GRanges(seqnames='chr1', strand='+', > ranges=IRanges(start=c(sample(18:22, 10, replace=TRUE), > sample(100:108, 10, replace=TRUE)), > width=sample(18:20, 20, replace=TRUE))) > strand(gr)[2:5] <- '-' > grl <- split(gr, strand(gr)) > > itake <- IRanges(start=100, end=150) > gtake <- GRanges(seqnames='chr1', ranges=itake) > > ## This will fail, don't think it should > fail <- endoapply(grl, function(x) { > subsetByOverlaps(x, gtake) > }) > > ## This works, and ultimately is what the previous one should be > good <- GRangesList(lapply(grl, function(x) { > take <- GRanges(seqnames='chr1', ranges=itake) > subsetByOverlaps(x, gtake) > })) > > ## Test IRangesList: this works. > irl <- IRangesList(lapply(grl, ranges)) > I assume someone else will take care of the bug, but wanted to mention that you could replace the above command with: irl <- seqapply(grl, ranges) endoapply is a stricter version of seqapply that enforces that the return value is the same type as the first argument. check <- endoapply(irl, function(x) { > subsetByOverlaps(x, itake) > }) > > Thanks, > -steve > > R version 2.12.0 Under development (unstable) (2010-08-18 r52771) > Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] grid stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] org.Hs.eg.db_2.4.1 AnnotationDbi_1.11.4 > [3] Biobase_2.9.0 BSgenome.Hsapiens.UCSC.hg18_1.3.16 > [5] ggplot2_0.8.8 proto_0.3-8 > [7] reshape_0.8.3 doMC_1.2.1 > [9] multicore_0.1-3 foreach_1.3.0 > [11] codetools_0.2-2 iterators_1.0.3 > [13] GenomeGraphs_1.9.0 biomaRt_2.5.1 > [15] bitops_1.0-4.1 Rsamtools_1.1.12 > [17] data.table_1.5 plyr_1.1 > [19] BSgenome_1.17.6 Biostrings_2.17.28 > [21] RSQLite_0.9-2 DBI_0.2-5 > [23] GenomicFeatures_1.1.11 GenomicRanges_1.1.20 > [25] IRanges_1.7.19 > > loaded via a namespace (and not attached): > [1] RCurl_1.4-3 rtracklayer_1.9.7 tools_2.12.0 XML_3.1-1 > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Steve, Thanks for reporting this issue. It turns out that the endoapply surfaced the error, but didn't cause it. The problem was upstream in the split,GRanges-method where there was a bug when splitting by a factor or 'factor' Rle that didn't have all levels present. (To check this try validObject(grl) on your example below.) This issue has been corrected in BioC 2.6 (GenomicRanges 1.0.8) and BioC 2.7 (GenomicRanges 1.1.21), which should be available for download from bioconductor.org within 36 hours. Cheers, Patrick On 8/24/10 2:36 PM, Michael Lawrence wrote: > On Tue, Aug 24, 2010 at 2:21 PM, Steve Lianoglou< > mailinglist.honeypot at gmail.com> wrote: > >> Hi all, >> >> I think this is the wrong behavior .. an endoapply(GRangesList, >> function(...)) will fail if the embedded function returns and empty >> GRanges object as one of its operations. >> >> Calling GRangesList(lapply(GRangesList, function(...))) will work. >> So too would an endoapply over an IRangesList that does the same thing ... >> >> In the code below -- the result that's meant to be stored in "fail" is >> what I'm talking about. The other two code blocks is just to show you >> that it works if you do an end-around of the endoapply(GRangesList, >> ...). >> >> I think I would expect the result that comes back in "good" down below >> to be the same thing that "fail" should be. >> >> >> gr<- GRanges(seqnames='chr1', strand='+', >> ranges=IRanges(start=c(sample(18:22, 10, replace=TRUE), >> sample(100:108, 10, replace=TRUE)), >> width=sample(18:20, 20, replace=TRUE))) >> strand(gr)[2:5]<- '-' >> grl<- split(gr, strand(gr)) >> >> itake<- IRanges(start=100, end=150) >> gtake<- GRanges(seqnames='chr1', ranges=itake) >> >> ## This will fail, don't think it should >> fail<- endoapply(grl, function(x) { >> subsetByOverlaps(x, gtake) >> }) >> >> ## This works, and ultimately is what the previous one should be >> good<- GRangesList(lapply(grl, function(x) { >> take<- GRanges(seqnames='chr1', ranges=itake) >> subsetByOverlaps(x, gtake) >> })) >> >> ## Test IRangesList: this works. >> irl<- IRangesList(lapply(grl, ranges)) >> > I assume someone else will take care of the bug, but wanted to mention that > you could replace the above command with: > > irl<- seqapply(grl, ranges) > > endoapply is a stricter version of seqapply that enforces that the return > value is the same type as the first argument. > > check<- endoapply(irl, function(x) { >> subsetByOverlaps(x, itake) >> }) >> >> Thanks, >> -steve >> >> R version 2.12.0 Under development (unstable) (2010-08-18 r52771) >> Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] grid stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] org.Hs.eg.db_2.4.1 AnnotationDbi_1.11.4 >> [3] Biobase_2.9.0 BSgenome.Hsapiens.UCSC.hg18_1.3.16 >> [5] ggplot2_0.8.8 proto_0.3-8 >> [7] reshape_0.8.3 doMC_1.2.1 >> [9] multicore_0.1-3 foreach_1.3.0 >> [11] codetools_0.2-2 iterators_1.0.3 >> [13] GenomeGraphs_1.9.0 biomaRt_2.5.1 >> [15] bitops_1.0-4.1 Rsamtools_1.1.12 >> [17] data.table_1.5 plyr_1.1 >> [19] BSgenome_1.17.6 Biostrings_2.17.28 >> [21] RSQLite_0.9-2 DBI_0.2-5 >> [23] GenomicFeatures_1.1.11 GenomicRanges_1.1.20 >> [25] IRanges_1.7.19 >> >> loaded via a namespace (and not attached): >> [1] RCurl_1.4-3 rtracklayer_1.9.7 tools_2.12.0 XML_3.1-1 >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.msk="" cc.org="" %7elianos="" contact=""> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 351 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6