Question: rtracklayer import.bed pipe inconsistency
0
gravatar for Nathan Sheffield
7.8 years ago by
Nathan Sheffield20 wrote:
Hi, I am having trouble with importing a bed file after running it through pipe() in R. Maybe it's a bug in rtracklayer's import.bed ? Or maybe I'm missing a setting, can anyone help with this? I have a bed file ("code25.bed") with 4 lines: chr17 60212869 60218774 chr1 108503808 108508915 chr8 86373506 86380637 chr8 99303546 99307608 I can read it into R with read.table like so: >read.table("Aug5/codeBed/code25.bed") V1 V2 V3 1 chr17 60212869 60218774 2 chr1 108503808 108508915 3 chr8 86373506 86380637 4 chr8 99303546 99307608 I want to use rtracklayer to import to get a genomicRanges object, so I try with import.bed, which also works: >import.bed("Aug5/codeBed/code25.bed") RangedData with 4 rows and 0 value columns across 3 spaces space ranges | <character> <iranges> | 1 chr1 [108503809, 108508915] | 2 chr17 [ 60212870, 60218774] | 3 chr8 [ 86373507, 86380637] | 4 chr8 [ 99303547, 99307608] | Now, I want this to work on bed files with more than 3 columns, just in case. I can do this with a commandline pipe using cut like so: >read.table(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) V1 V2 V3 1 chr17 60212869 60218774 2 chr1 108503808 108508915 3 chr8 86373506 86380637 4 chr8 99303546 99307608 So this gives the exact same output as the first read.table above. However, when I try to pass this pipe to import.bed, something strange happens: >import.bed(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) RangedData with 5 rows and 0 value columns across 3 spaces space ranges | <character> <iranges> | 1 chr1 [108503809, 108508915] | 2 chr17 [ 60212870, 60218774] | 3 chr17 [ 60212870, 60218774] | 4 chr8 [ 86373507, 86380637] | 5 chr8 [ 99303547, 99307608] | Not sure why, but it has duplicated one of the regions and now has 5, instead of 4. This is a problem with import.bed combined with pipe, and has nothing to do with cut: > import.bed(pipe("cat Aug5/codeBed/code25.bed")) RangedData with 5 rows and 0 value columns across 3 spaces space ranges | <character> <iranges> | 1 chr1 [108503809, 108508915] | 2 chr17 [ 60212870, 60218774] | 3 chr17 [ 60212870, 60218774] | 4 chr8 [ 86373507, 86380637] | 5 chr8 [ 99303547, 99307608] | any ideas? -Nathan Sheffield Duke University, Computational Biology Program sessionInfo follows: R version 2.12.0 (2010-10-15) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=no_NO.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1 [4] GenomicRanges_1.2.1 IRanges_1.8.7 loaded via a namespace (and not attached): [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0
rtracklayer genomicranges • 557 views
ADD COMMENTlink modified 7.8 years ago by Michael Lawrence11k • written 7.8 years ago by Nathan Sheffield20
Answer: rtracklayer import.bed pipe inconsistency
0
gravatar for Michael Lawrence
7.8 years ago by
United States
Michael Lawrence11k wrote:
I can't reproduce this: > import(pipe("cat ~/tmp/pipe-test.bed"), format="bed") RangedData with 4 rows and 0 value columns across 3 spaces space ranges | <factor> <iranges> | 1 chr1 [108503809, 108508915] | 2 chr17 [ 60212870, 60218774] | 3 chr8 [ 86373507, 86380637] | 4 chr8 [ 99303547, 99307608] | > sessionInfo() R version 2.14.0 Under development (unstable) (--) Platform: i686-pc-linux-gnu (32-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.13.12 RCurl_1.5-0 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] BSgenome_1.21.3 Biostrings_2.21.6 GenomicRanges_1.5.21 [4] IRanges_1.11.16 XML_3.2-0 zlibbioc_0.1.6 I don't remember this being an issue in the past, but who knows. My only recommendation is to upgrade your R and rtracklayer. Michael On Mon, Aug 29, 2011 at 8:56 AM, Nathan Sheffield <nathan.sheffield@duke.edu> wrote: > Hi, > > I am having trouble with importing a bed file after running it through > pipe() in R. Maybe it's a bug in rtracklayer's import.bed ? Or maybe I'm > missing a setting, can anyone help with this? > > I have a bed file ("code25.bed") with 4 lines: > chr17 60212869 60218774 > chr1 108503808 108508915 > chr8 86373506 86380637 > chr8 99303546 99307608 > > I can read it into R with read.table like so: > >> read.table("Aug5/codeBed/**code25.bed") >> > V1 V2 V3 > 1 chr17 60212869 60218774 > 2 chr1 108503808 108508915 > 3 chr8 86373506 86380637 > 4 chr8 99303546 99307608 > > I want to use rtracklayer to import to get a genomicRanges object, so I try > with import.bed, which also works: > >> import.bed("Aug5/codeBed/**code25.bed") >> > RangedData with 4 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr8 [ 86373507, 86380637] | > 4 chr8 [ 99303547, 99307608] | > > Now, I want this to work on bed files with more than 3 columns, just in > case. I can do this with a commandline pipe using cut like so: > > read.table(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) >> > V1 V2 V3 > 1 chr17 60212869 60218774 > 2 chr1 108503808 108508915 > 3 chr8 86373506 86380637 > 4 chr8 99303546 99307608 > > So this gives the exact same output as the first read.table above. However, > when I try to pass this pipe to import.bed, something strange happens: > > import.bed(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) >> > RangedData with 5 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr17 [ 60212870, 60218774] | > 4 chr8 [ 86373507, 86380637] | > 5 chr8 [ 99303547, 99307608] | > > Not sure why, but it has duplicated one of the regions and now has 5, > instead of 4. This is a problem with import.bed combined with pipe, and has > nothing to do with cut: > > import.bed(pipe("cat Aug5/codeBed/code25.bed")) >> > RangedData with 5 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr17 [ 60212870, 60218774] | > 4 chr8 [ 86373507, 86380637] | > 5 chr8 [ 99303547, 99307608] | > > > any ideas? > > -Nathan Sheffield > Duke University, Computational Biology Program > > sessionInfo follows: > > R version 2.12.0 (2010-10-15) > Platform: x86_64-unknown-linux-gnu (64-bit) > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=no_NO.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1 > [4] GenomicRanges_1.2.1 IRanges_1.8.7 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD COMMENTlink written 7.8 years ago by Michael Lawrence11k
And also, just btw, What do you mean by a BED file more than three columns? rtracklayer can read those in just fine, unless they are non-standard columns, in which case you really don't have a BED file anyway. With newer versions of rtracklayer, one can specify the colnames argument to select only the desired BED columns. Passing character() would give you your desired result. Michael On Mon, Aug 29, 2011 at 4:02 PM, Michael Lawrence <michafla@gene.com> wrote: > I can't reproduce this: > > > import(pipe("cat ~/tmp/pipe-test.bed"), format="bed") > > RangedData with 4 rows and 0 value columns across 3 spaces > space ranges | > <factor> <iranges> | > > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr8 [ 86373507, 86380637] | > 4 chr8 [ 99303547, 99307608] | > > sessionInfo() > R version 2.14.0 Under development (unstable) (--) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] C > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.13.12 RCurl_1.5-0 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] BSgenome_1.21.3 Biostrings_2.21.6 GenomicRanges_1.5.21 > [4] IRanges_1.11.16 XML_3.2-0 zlibbioc_0.1.6 > > > I don't remember this being an issue in the past, but who knows. My only > recommendation is to upgrade your R and rtracklayer. > > Michael > > > On Mon, Aug 29, 2011 at 8:56 AM, Nathan Sheffield < > nathan.sheffield@duke.edu> wrote: > >> Hi, >> >> I am having trouble with importing a bed file after running it through >> pipe() in R. Maybe it's a bug in rtracklayer's import.bed ? Or maybe I'm >> missing a setting, can anyone help with this? >> >> I have a bed file ("code25.bed") with 4 lines: >> chr17 60212869 60218774 >> chr1 108503808 108508915 >> chr8 86373506 86380637 >> chr8 99303546 99307608 >> >> I can read it into R with read.table like so: >> >>> read.table("Aug5/codeBed/**code25.bed") >>> >> V1 V2 V3 >> 1 chr17 60212869 60218774 >> 2 chr1 108503808 108508915 >> 3 chr8 86373506 86380637 >> 4 chr8 99303546 99307608 >> >> I want to use rtracklayer to import to get a genomicRanges object, so I >> try with import.bed, which also works: >> >>> import.bed("Aug5/codeBed/**code25.bed") >>> >> RangedData with 4 rows and 0 value columns across 3 spaces >> space ranges | >> <character> <iranges> | >> 1 chr1 [108503809, 108508915] | >> 2 chr17 [ 60212870, 60218774] | >> 3 chr8 [ 86373507, 86380637] | >> 4 chr8 [ 99303547, 99307608] | >> >> Now, I want this to work on bed files with more than 3 columns, just in >> case. I can do this with a commandline pipe using cut like so: >> >> read.table(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) >>> >> V1 V2 V3 >> 1 chr17 60212869 60218774 >> 2 chr1 108503808 108508915 >> 3 chr8 86373506 86380637 >> 4 chr8 99303546 99307608 >> >> So this gives the exact same output as the first read.table above. >> However, when I try to pass this pipe to import.bed, something strange >> happens: >> >> import.bed(pipe(paste("cut -f1,2,3 ", "Aug5/codeBed/code25.bed"))) >>> >> RangedData with 5 rows and 0 value columns across 3 spaces >> space ranges | >> <character> <iranges> | >> 1 chr1 [108503809, 108508915] | >> 2 chr17 [ 60212870, 60218774] | >> 3 chr17 [ 60212870, 60218774] | >> 4 chr8 [ 86373507, 86380637] | >> 5 chr8 [ 99303547, 99307608] | >> >> Not sure why, but it has duplicated one of the regions and now has 5, >> instead of 4. This is a problem with import.bed combined with pipe, and has >> nothing to do with cut: >> >> import.bed(pipe("cat Aug5/codeBed/code25.bed")) >>> >> RangedData with 5 rows and 0 value columns across 3 spaces >> space ranges | >> <character> <iranges> | >> 1 chr1 [108503809, 108508915] | >> 2 chr17 [ 60212870, 60218774] | >> 3 chr17 [ 60212870, 60218774] | >> 4 chr8 [ 86373507, 86380637] | >> 5 chr8 [ 99303547, 99307608] | >> >> >> any ideas? >> >> -Nathan Sheffield >> Duke University, Computational Biology Program >> >> sessionInfo follows: >> >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=no_NO.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1 >> [4] GenomicRanges_1.2.1 IRanges_1.8.7 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0 >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]]
ADD REPLYlink written 7.8 years ago by Michael Lawrence11k
Thanks, I figured it might just come down to updating. And yes, I did mean nonstandard columns -- even though it's not "really" a BED file, it's still helpful to be able to import just the first 3 columns so that my script can handle any type of BED-like file, standard or not. The ability to select columns will be helpful in the future, thanks. -Nathan On 08/30/2011 01:05 AM, Michael Lawrence wrote: > And also, just btw, > > What do you mean by a BED file more than three columns? rtracklayer can > read those in just fine, unless they are non-standard columns, in which > case you really don't have a BED file anyway. > > With newer versions of rtracklayer, one can specify the colnames > argument to select only the desired BED columns. Passing character() > would give you your desired result. > > Michael > > On Mon, Aug 29, 2011 at 4:02 PM, Michael Lawrence <michafla at="" gene.com=""> <mailto:michafla at="" gene.com="">> wrote: > > I can't reproduce this: > > > import(pipe("cat ~/tmp/pipe-test.bed"), format="bed") > > RangedData with 4 rows and 0 value columns across 3 spaces > space ranges | > <factor> <iranges> | > > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr8 [ 86373507, 86380637] | > 4 chr8 [ 99303547, 99307608] | > > sessionInfo() > R version 2.14.0 Under development (unstable) (--) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] C > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.13.12 RCurl_1.5-0 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] BSgenome_1.21.3 Biostrings_2.21.6 GenomicRanges_1.5.21 > [4] IRanges_1.11.16 XML_3.2-0 zlibbioc_0.1.6 > > > I don't remember this being an issue in the past, but who knows. My > only recommendation is to upgrade your R and rtracklayer. > > Michael > > > On Mon, Aug 29, 2011 at 8:56 AM, Nathan Sheffield > <nathan.sheffield at="" duke.edu="" <mailto:nathan.sheffield="" at="" duke.edu="">> wrote: > > Hi, > > I am having trouble with importing a bed file after running it > through pipe() in R. Maybe it's a bug in rtracklayer's > import.bed ? Or maybe I'm missing a setting, can anyone help > with this? > > I have a bed file ("code25.bed") with 4 lines: > chr17 60212869 60218774 > chr1 108503808 108508915 > chr8 86373506 86380637 > chr8 99303546 99307608 > > I can read it into R with read.table like so: > > read.table("Aug5/codeBed/__code25.bed") > > V1 V2 V3 > 1 chr17 60212869 60218774 > 2 chr1 108503808 108508915 > 3 chr8 86373506 86380637 > 4 chr8 99303546 99307608 > > I want to use rtracklayer to import to get a genomicRanges > object, so I try with import.bed, which also works: > > import.bed("Aug5/codeBed/__code25.bed") > > RangedData with 4 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr8 [ 86373507, 86380637] | > 4 chr8 [ 99303547, 99307608] | > > Now, I want this to work on bed files with more than 3 columns, > just in case. I can do this with a commandline pipe using cut > like so: > > read.table(pipe(paste("cut -f1,2,3 ", > "Aug5/codeBed/code25.bed"))) > > V1 V2 V3 > 1 chr17 60212869 60218774 > 2 chr1 108503808 108508915 > 3 chr8 86373506 86380637 > 4 chr8 99303546 99307608 > > So this gives the exact same output as the first read.table > above. However, when I try to pass this pipe to import.bed, > something strange happens: > > import.bed(pipe(paste("cut -f1,2,3 ", > "Aug5/codeBed/code25.bed"))) > > RangedData with 5 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr17 [ 60212870, 60218774] | > 4 chr8 [ 86373507, 86380637] | > 5 chr8 [ 99303547, 99307608] | > > Not sure why, but it has duplicated one of the regions and now > has 5, instead of 4. This is a problem with import.bed combined > with pipe, and has nothing to do with cut: > > import.bed(pipe("cat Aug5/codeBed/code25.bed")) > > RangedData with 5 rows and 0 value columns across 3 spaces > space ranges | > <character> <iranges> | > 1 chr1 [108503809, 108508915] | > 2 chr17 [ 60212870, 60218774] | > 3 chr17 [ 60212870, 60218774] | > 4 chr8 [ 86373507, 86380637] | > 5 chr8 [ 99303547, 99307608] | > > > any ideas? > > -Nathan Sheffield > Duke University, Computational Biology Program > > sessionInfo follows: > > R version 2.12.0 (2010-10-15) > Platform: x86_64-unknown-linux-gnu (64-bit) > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=no_NO.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1 > [4] GenomicRanges_1.2.1 IRanges_1.8.7 > > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.0 XML_3.2-0 > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > >
ADD REPLYlink written 7.8 years ago by Nathan Sheffield20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour