GRanges, strange behavior when chromosomes don't match
1
0
Entering edit mode
@cei-abreu-goodger-4433
Last seen 9.1 years ago
Mexico
Hello all, When you have two GRanges objects which don't have the same seqnames factor, you can get very unexpected behavior. Would it be possible to get an error/warning when this is attempted? I don't know how many functions are affected, perhaps the functions mentioned below can be modified to produce a more obvious result... Many thanks, Cei Code example / sessionInfo: library(GenomicRanges) gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"), ranges = IRanges(1:4, width = 10), strand = c("-", "+", "+", "+")) gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"), ranges = IRanges(1:4, width = 10), strand = c("-", "+", "+", "+")) union(gr1,gr2) # gives expected result, all 6 ranges intersect(gr1,gr2) # includes ranges from the missing gr2 chromosomes setdiff(gr1,gr2) # includes 3 copies of the ranges from the missing gr2 chromosomes sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] GenomicRanges_1.0.5 IRanges_1.6.8 Biobase_2.8.0 loaded via a namespace (and not attached): [1] tools_2.11.0
• 1.4k views
ADD COMMENT
0
Entering edit mode
Patrick Aboyoun ★ 1.6k
@patrick-aboyoun-6734
Last seen 9.5 years ago
United States
Cei, Thanks for the bug report. I am looking into the issue now and will have a patch out shortly. Cheers, Patrick On 7/8/10 9:22 AM, Cei Abreu-Goodger wrote: > Hello all, > > When you have two GRanges objects which don't have the same seqnames > factor, you can get very unexpected behavior. > > Would it be possible to get an error/warning when this is attempted? I > don't know how many functions are affected, perhaps the functions > mentioned below can be modified to produce a more obvious result... > > Many thanks, > > Cei > > > Code example / sessionInfo: > > library(GenomicRanges) > gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"), > ranges = IRanges(1:4, width = 10), > strand = c("-", "+", "+", "+")) > gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"), > ranges = IRanges(1:4, width = 10), > strand = c("-", "+", "+", "+")) > > union(gr1,gr2) > # gives expected result, all 6 ranges > > intersect(gr1,gr2) > # includes ranges from the missing gr2 chromosomes > > setdiff(gr1,gr2) > # includes 3 copies of the ranges from the missing gr2 chromosomes > > > sessionInfo() > R version 2.11.0 (2010-04-22) > i386-apple-darwin9.8.0 > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] GenomicRanges_1.0.5 IRanges_1.6.8 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] tools_2.11.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Cei, I checked in a patch to BioC 2.6 (GenomicRanges 1.0.6) and BioC 2.7 (GenomicRanges 1.1.16) that fixes the issue you found in the intersect and setdiff methods for GRanges objects when the two objects don't have the same sequence names. These new packages will be available on bioconductor.org within the next 36 hours. The intersect and setdiff method for GRanges now form the union of sequence names for both objects before performing their operations. The output for the examples you sent are now > library(GenomicRanges) > gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"), + ranges = IRanges(1:4, width = 10), + strand = c("-", "+", "+", "+")) > gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"), + ranges = IRanges(1:4, width = 10), + strand = c("-", "+", "+", "+")) > intersect(gr1,gr2) GRanges with 2 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] chr1 [1, 10] - | [2] chr2 [2, 11] + | seqlengths chr1 chr2 chr3 chr4 chr5 chr6 NA NA NA NA NA NA > setdiff(gr1,gr2) GRanges with 2 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] chr3 [3, 12] + | [2] chr4 [4, 13] + | seqlengths chr1 chr2 chr3 chr4 chr5 chr6 NA NA NA NA NA NA > sessionInfo() R version 2.12.0 Under development (unstable) (2010-07-01 r52425) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicRanges_1.1.16 IRanges_1.7.10 Cheers, Patrick On 7/8/10 4:55 PM, Patrick Aboyoun wrote: > Cei, > Thanks for the bug report. I am looking into the issue now and will > have a patch out shortly. > > > Cheers, > Patrick > > > On 7/8/10 9:22 AM, Cei Abreu-Goodger wrote: >> Hello all, >> >> When you have two GRanges objects which don't have the same seqnames >> factor, you can get very unexpected behavior. >> >> Would it be possible to get an error/warning when this is attempted? >> I don't know how many functions are affected, perhaps the functions >> mentioned below can be modified to produce a more obvious result... >> >> Many thanks, >> >> Cei >> >> >> Code example / sessionInfo: >> >> library(GenomicRanges) >> gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"), >> ranges = IRanges(1:4, width = 10), >> strand = c("-", "+", "+", "+")) >> gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"), >> ranges = IRanges(1:4, width = 10), >> strand = c("-", "+", "+", "+")) >> >> union(gr1,gr2) >> # gives expected result, all 6 ranges >> >> intersect(gr1,gr2) >> # includes ranges from the missing gr2 chromosomes >> >> setdiff(gr1,gr2) >> # includes 3 copies of the ranges from the missing gr2 chromosomes >> >> >> sessionInfo() >> R version 2.11.0 (2010-04-22) >> i386-apple-darwin9.8.0 >> >> locale: >> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices datasets utils methods base >> >> other attached packages: >> [1] GenomicRanges_1.0.5 IRanges_1.6.8 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] tools_2.11.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Thanks Patrick! Patrick Aboyoun wrote: > Cei, > I checked in a patch to BioC 2.6 (GenomicRanges 1.0.6) and BioC 2.7 > (GenomicRanges 1.1.16) that fixes the issue you found in the intersect > and setdiff methods for GRanges objects when the two objects don't have > the same sequence names. These new packages will be available on > bioconductor.org within the next 36 hours. > > The intersect and setdiff method for GRanges now form the union of > sequence names for both objects before performing their operations. The > output for the examples you sent are now > > > library(GenomicRanges) > > gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"), > + ranges = IRanges(1:4, width = 10), > + strand = c("-", "+", "+", "+")) > > gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"), > + ranges = IRanges(1:4, width = 10), > + strand = c("-", "+", "+", "+")) > > > intersect(gr1,gr2) > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] chr1 [1, 10] - | > [2] chr2 [2, 11] + | > > seqlengths > chr1 chr2 chr3 chr4 chr5 chr6 > NA NA NA NA NA NA > > > setdiff(gr1,gr2) > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] chr3 [3, 12] + | > [2] chr4 [4, 13] + | > > seqlengths > chr1 chr2 chr3 chr4 chr5 chr6 > NA NA NA NA NA NA > > > sessionInfo() > R version 2.12.0 Under development (unstable) (2010-07-01 r52425) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicRanges_1.1.16 IRanges_1.7.10 > > > Cheers, > Patrick > > > > On 7/8/10 4:55 PM, Patrick Aboyoun wrote: >> Cei, >> Thanks for the bug report. I am looking into the issue now and will >> have a patch out shortly. >> >> >> Cheers, >> Patrick >> >> >> On 7/8/10 9:22 AM, Cei Abreu-Goodger wrote: >>> Hello all, >>> >>> When you have two GRanges objects which don't have the same seqnames >>> factor, you can get very unexpected behavior. >>> >>> Would it be possible to get an error/warning when this is attempted? >>> I don't know how many functions are affected, perhaps the functions >>> mentioned below can be modified to produce a more obvious result... >>> >>> Many thanks, >>> >>> Cei >>> >>> >>> Code example / sessionInfo: >>> >>> library(GenomicRanges) >>> gr1 <- GRanges(seqnames = c("chr1", "chr2", "chr3", "chr4"), >>> ranges = IRanges(1:4, width = 10), >>> strand = c("-", "+", "+", "+")) >>> gr2 <- GRanges(seqnames = c("chr1", "chr2", "chr5", "chr6"), >>> ranges = IRanges(1:4, width = 10), >>> strand = c("-", "+", "+", "+")) >>> >>> union(gr1,gr2) >>> # gives expected result, all 6 ranges >>> >>> intersect(gr1,gr2) >>> # includes ranges from the missing gr2 chromosomes >>> >>> setdiff(gr1,gr2) >>> # includes 3 copies of the ranges from the missing gr2 chromosomes >>> >>> >>> sessionInfo() >>> R version 2.11.0 (2010-04-22) >>> i386-apple-darwin9.8.0 >>> >>> locale: >>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices datasets utils methods base >>> >>> other attached packages: >>> [1] GenomicRanges_1.0.5 IRanges_1.6.8 Biobase_2.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.11.0 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6