Question

rtracklayer import.bed pipe inconsistency

0

Entering edit mode

Nathan Sheffield ▴ 20

@nathan-sheffield-4827

Last seen 9.6 years ago

Hi, I am having trouble with importing a bed file after running it through pipe() in R. Maybe it's a bug in rtracklayer's import.bed ? Or maybe I'm missing a setting, can anyone help with this? I have a bed file ("code25.bed") with 4 lines: chr17 60212869 60218774 chr1 108503808 108508915 chr8 86373506 86380637 chr8 99303546 99307608 I can read it into R with read.table like so: >read.table("Aug5/codeBed/code25.bed") V1 V2 V3 1 chr17 60212869 60218774 2 chr1 108503808 108508915 3 chr8 86373506 86380637 4 chr8 99303546 99307608 I want to use rtracklayer to import to get a genomicRanges object, so I try with import.bed, which also works: >import.bed("Aug5/codeBed/code25.bed") RangedData with 4 rows and 0 value columns across 3 spaces space ranges | <character> <iranges> | 1 chr1 [108503809, 108508915] | 2 chr17 [ 60212870, 60218774] | 3 chr8 [ 86373507, 86380637] | 4 chr8 [ 99303547, 99307608] | Now, I want this to work on bed files with more than 3 columns, just in case. I can do this with a commandline pipe using cut like so: >read.table(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) V1 V2 V3 1 chr17 60212869 60218774 2 chr1 108503808 108508915 3 chr8 86373506 86380637 4 chr8 99303546 99307608 So this gives the exact same output as the first read.table above. However, when I try to pass this pipe to import.bed, something strange happens: >import.bed(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) RangedData with 5 rows and 0 value columns across 3 spaces space ranges | <character> <iranges> | 1 chr1 [108503809, 108508915] | 2 chr17 [ 60212870, 60218774] | 3 chr17 [ 60212870, 60218774] | 4 chr8 [ 86373507, 86380637] | 5 chr8 [ 99303547, 99307608] | Not sure why, but it has duplicated one of the regions and now has 5, instead of 4. This is a problem with import.bed combined with pipe, and has nothing to do with cut: > import.bed(pipe("cat Aug5/codeBed/code25.bed")) RangedData with 5 rows and 0 value columns across 3 spaces space ranges | <character> <iranges> | 1 chr1 [108503809, 108508915] | 2 chr17 [ 60212870, 60218774] | 3 chr17 [ 60212870, 60218774] | 4 chr8 [ 86373507, 86380637] | 5 chr8 [ 99303547, 99307608] | any ideas? -Nathan Sheffield Duke University, Computational Biology Program sessionInfo follows: R version 2.12.0 (2010-10-15) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=no_NO.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1 [4] GenomicRanges_1.2.1 IRanges_1.8.7 loaded via a namespace (and not attached): [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0

rtracklayer GenomicRanges rtracklayer GenomicRanges • 857 views

ADD COMMENT • link updated 12.7 years ago by Michael Lawrence ★ 11k • written 12.7 years ago by Nathan Sheffield ▴ 20

score 0 · Answer 1 · 2011-08-29

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

I can't reproduce this: > import(pipe("cat ~/tmp/pipe-test.bed"), format="bed") RangedData with 4 rows and 0 value columns across 3 spaces space ranges | <factor> <iranges> | 1 chr1 [108503809, 108508915] | 2 chr17 [ 60212870, 60218774] | 3 chr8 [ 86373507, 86380637] | 4 chr8 [ 99303547, 99307608] | > sessionInfo() R version 2.14.0 Under development (unstable) (--) Platform: i686-pc-linux-gnu (32-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.13.12 RCurl_1.5-0 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] BSgenome_1.21.3 Biostrings_2.21.6 GenomicRanges_1.5.21 [4] IRanges_1.11.16 XML_3.2-0 zlibbioc_0.1.6 I don't remember this being an issue in the past, but who knows. My only recommendation is to upgrade your R and rtracklayer. Michael On Mon, Aug 29, 2011 at 8:56 AM, Nathan Sheffield <nathan.sheffield@duke.edu> wrote: > Hi, > > I am having trouble with importing a bed file after running it through > pipe() in R. Maybe it's a bug in rtracklayer's import.bed ? Or maybe I'm > missing a setting, can anyone help with this? > > I have a bed file ("code25.bed") with 4 lines: > chr17 60212869 60218774 > chr1 108503808 108508915 > chr8 86373506 86380637 > chr8 99303546 99307608 > > I can read it into R with read.table like so: > >> read.table("Aug5/codeBed/**code25.bed") >> > V1 V2 V3 > 1 chr17 60212869 60218774 > 2 chr1 108503808 108508915 > 3 chr8 86373506 86380637 > 4 chr8 99303546 99307608 > > I want to use rtracklayer to import to get a genomicRanges object, so I try > with import.bed, which also works: > >> import.bed("Aug5/codeBed/**code25.bed") >> > RangedData with 4 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr8 [ 86373507, 86380637] | > 4 chr8 [ 99303547, 99307608] | > > Now, I want this to work on bed files with more than 3 columns, just in > case. I can do this with a commandline pipe using cut like so: > > read.table(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) >> > V1 V2 V3 > 1 chr17 60212869 60218774 > 2 chr1 108503808 108508915 > 3 chr8 86373506 86380637 > 4 chr8 99303546 99307608 > > So this gives the exact same output as the first read.table above. However, > when I try to pass this pipe to import.bed, something strange happens: > > import.bed(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) >> > RangedData with 5 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr17 [ 60212870, 60218774] | > 4 chr8 [ 86373507, 86380637] | > 5 chr8 [ 99303547, 99307608] | > > Not sure why, but it has duplicated one of the regions and now has 5, > instead of 4. This is a problem with import.bed combined with pipe, and has > nothing to do with cut: > > import.bed(pipe("cat Aug5/codeBed/code25.bed")) >> > RangedData with 5 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr17 [ 60212870, 60218774] | > 4 chr8 [ 86373507, 86380637] | > 5 chr8 [ 99303547, 99307608] | > > > any ideas? > > -Nathan Sheffield > Duke University, Computational Biology Program > > sessionInfo follows: > > R version 2.12.0 (2010-10-15) > Platform: x86_64-unknown-linux-gnu (64-bit) > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=no_NO.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1 > [4] GenomicRanges_1.2.1 IRanges_1.8.7 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD COMMENT • link 12.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

And also, just btw, What do you mean by a BED file more than three columns? rtracklayer can read those in just fine, unless they are non-standard columns, in which case you really don't have a BED file anyway. With newer versions of rtracklayer, one can specify the colnames argument to select only the desired BED columns. Passing character() would give you your desired result. Michael On Mon, Aug 29, 2011 at 4:02 PM, Michael Lawrence <michafla@gene.com> wrote: > I can't reproduce this: > > > import(pipe("cat ~/tmp/pipe-test.bed"), format="bed") > > RangedData with 4 rows and 0 value columns across 3 spaces > space ranges | > <factor> <iranges> | > > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr8 [ 86373507, 86380637] | > 4 chr8 [ 99303547, 99307608] | > > sessionInfo() > R version 2.14.0 Under development (unstable) (--) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] C > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.13.12 RCurl_1.5-0 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] BSgenome_1.21.3 Biostrings_2.21.6 GenomicRanges_1.5.21 > [4] IRanges_1.11.16 XML_3.2-0 zlibbioc_0.1.6 > > > I don't remember this being an issue in the past, but who knows. My only > recommendation is to upgrade your R and rtracklayer. > > Michael > > > On Mon, Aug 29, 2011 at 8:56 AM, Nathan Sheffield < > nathan.sheffield@duke.edu> wrote: > >> Hi, >> >> I am having trouble with importing a bed file after running it through >> pipe() in R. Maybe it's a bug in rtracklayer's import.bed ? Or maybe I'm >> missing a setting, can anyone help with this? >> >> I have a bed file ("code25.bed") with 4 lines: >> chr17 60212869 60218774 >> chr1 108503808 108508915 >> chr8 86373506 86380637 >> chr8 99303546 99307608 >> >> I can read it into R with read.table like so: >> >>> read.table("Aug5/codeBed/**code25.bed") >>> >> V1 V2 V3 >> 1 chr17 60212869 60218774 >> 2 chr1 108503808 108508915 >> 3 chr8 86373506 86380637 >> 4 chr8 99303546 99307608 >> >> I want to use rtracklayer to import to get a genomicRanges object, so I >> try with import.bed, which also works: >> >>> import.bed("Aug5/codeBed/**code25.bed") >>> >> RangedData with 4 rows and 0 value columns across 3 spaces >> space ranges | >> <character> <iranges> | >> 1 chr1 [108503809, 108508915] | >> 2 chr17 [ 60212870, 60218774] | >> 3 chr8 [ 86373507, 86380637] | >> 4 chr8 [ 99303547, 99307608] | >> >> Now, I want this to work on bed files with more than 3 columns, just in >> case. I can do this with a commandline pipe using cut like so: >> >> read.table(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) >>> >> V1 V2 V3 >> 1 chr17 60212869 60218774 >> 2 chr1 108503808 108508915 >> 3 chr8 86373506 86380637 >> 4 chr8 99303546 99307608 >> >> So this gives the exact same output as the first read.table above. >> However, when I try to pass this pipe to import.bed, something strange >> happens: >> >> import.bed(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) >>> >> RangedData with 5 rows and 0 value columns across 3 spaces >> space ranges | >> <character> <iranges> | >> 1 chr1 [108503809, 108508915] | >> 2 chr17 [ 60212870, 60218774] | >> 3 chr17 [ 60212870, 60218774] | >> 4 chr8 [ 86373507, 86380637] | >> 5 chr8 [ 99303547, 99307608] | >> >> Not sure why, but it has duplicated one of the regions and now has 5, >> instead of 4. This is a problem with import.bed combined with pipe, and has >> nothing to do with cut: >> >> import.bed(pipe("cat Aug5/codeBed/code25.bed")) >>> >> RangedData with 5 rows and 0 value columns across 3 spaces >> space ranges | >> <character> <iranges> | >> 1 chr1 [108503809, 108508915] | >> 2 chr17 [ 60212870, 60218774] | >> 3 chr17 [ 60212870, 60218774] | >> 4 chr8 [ 86373507, 86380637] | >> 5 chr8 [ 99303547, 99307608] | >> >> >> any ideas? >> >> -Nathan Sheffield >> Duke University, Computational Biology Program >> >> sessionInfo follows: >> >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=no_NO.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1 >> [4] GenomicRanges_1.2.1 IRanges_1.8.7 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0 >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks, I figured it might just come down to updating. And yes, I did mean nonstandard columns -- even though it's not "really" a BED file, it's still helpful to be able to import just the first 3 columns so that my script can handle any type of BED-like file, standard or not. The ability to select columns will be helpful in the future, thanks. -Nathan On 08/30/2011 01:05 AM, Michael Lawrence wrote: > And also, just btw, > > What do you mean by a BED file more than three columns? rtracklayer can > read those in just fine, unless they are non-standard columns, in which > case you really don't have a BED file anyway. > > With newer versions of rtracklayer, one can specify the colnames > argument to select only the desired BED columns. Passing character() > would give you your desired result. > > Michael > > On Mon, Aug 29, 2011 at 4:02 PM, Michael Lawrence <michafla at="" gene.com=""> <mailto:michafla at="" gene.com="">> wrote: > > I can't reproduce this: > > > import(pipe("cat ~/tmp/pipe-test.bed"), format="bed") > > RangedData with 4 rows and 0 value columns across 3 spaces > space ranges | > <factor> <iranges> | > > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr8 [ 86373507, 86380637] | > 4 chr8 [ 99303547, 99307608] | > > sessionInfo() > R version 2.14.0 Under development (unstable) (--) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] C > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.13.12 RCurl_1.5-0 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] BSgenome_1.21.3 Biostrings_2.21.6 GenomicRanges_1.5.21 > [4] IRanges_1.11.16 XML_3.2-0 zlibbioc_0.1.6 > > > I don't remember this being an issue in the past, but who knows. My > only recommendation is to upgrade your R and rtracklayer. > > Michael > > > On Mon, Aug 29, 2011 at 8:56 AM, Nathan Sheffield > <nathan.sheffield at="" duke.edu="" <mailto:nathan.sheffield="" at="" duke.edu="">> wrote: > > Hi, > > I am having trouble with importing a bed file after running it > through pipe() in R. Maybe it's a bug in rtracklayer's > import.bed ? Or maybe I'm missing a setting, can anyone help > with this? > > I have a bed file ("code25.bed") with 4 lines: > chr17 60212869 60218774 > chr1 108503808 108508915 > chr8 86373506 86380637 > chr8 99303546 99307608 > > I can read it into R with read.table like so: > > read.table("Aug5/codeBed/__code25.bed") > > V1 V2 V3 > 1 chr17 60212869 60218774 > 2 chr1 108503808 108508915 > 3 chr8 86373506 86380637 > 4 chr8 99303546 99307608 > > I want to use rtracklayer to import to get a genomicRanges > object, so I try with import.bed, which also works: > > import.bed("Aug5/codeBed/__code25.bed") > > RangedData with 4 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr8 [ 86373507, 86380637] | > 4 chr8 [ 99303547, 99307608] | > > Now, I want this to work on bed files with more than 3 columns, > just in case. I can do this with a commandline pipe using cut > like so: > > read.table(pipe(paste("cut -f1,2,3 ", > "Aug5/codeBed/code25.bed"))) > > V1 V2 V3 > 1 chr17 60212869 60218774 > 2 chr1 108503808 108508915 > 3 chr8 86373506 86380637 > 4 chr8 99303546 99307608 > > So this gives the exact same output as the first read.table > above. However, when I try to pass this pipe to import.bed, > something strange happens: > > import.bed(pipe(paste("cut -f1,2,3 ", > "Aug5/codeBed/code25.bed"))) > > RangedData with 5 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr17 [ 60212870, 60218774] | > 4 chr8 [ 86373507, 86380637] | > 5 chr8 [ 99303547, 99307608] | > > Not sure why, but it has duplicated one of the regions and now > has 5, instead of 4. This is a problem with import.bed combined > with pipe, and has nothing to do with cut: > > import.bed(pipe("cat Aug5/codeBed/code25.bed")) > > RangedData with 5 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr17 [ 60212870, 60218774] | > 4 chr8 [ 86373507, 86380637] | > 5 chr8 [ 99303547, 99307608] | > > > any ideas? > > -Nathan Sheffield > Duke University, Computational Biology Program > > sessionInfo follows: > > R version 2.12.0 (2010-10-15) > Platform: x86_64-unknown-linux-gnu (64-bit) > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=no_NO.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1 > [4] GenomicRanges_1.2.1 IRanges_1.8.7 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0 > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > >

ADD REPLY • link 12.7 years ago Nathan Sheffield ▴ 20