trimTails function in ShortRead package give different results on the same input
1
0
Entering edit mode
Zhenyu Xu ▴ 10
@zhenyu-xu-5556
Last seen 9.6 years ago
Hi ShortRead package developer, I tried to use the function trimTails to trim some bad quality bases from reads coming out of 454 sequencing machine. However I got different results if I run the command several times starting from the same ShortReadQ object and same trimming parameter. This is observed in centos linux machine (6.2 and 6.3). I also tried this with my own mac machine, but the results are identical. So seems the problem only restrict to centos linux machine (Not sure other linux platform has this problem or not). the data sets(~11Mb) can be downloaded at http://dl.dropbox.com/u/68829208/454reads.rds. best, zhenyu Please see the following of the execution: wget http://dl.dropbox.com/u/68829208/454reads.rds R R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(ShortRead) Loading required package: BiocGenerics Attaching package: ?BiocGenerics? The following object(s) are masked from ?package:stats?: xtabs The following object(s) are masked from ?package:base?: anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: IRanges Loading required package: GenomicRanges Loading required package: Biostrings Loading required package: lattice Loading required package: Rsamtools Loading required package: latticeExtra Loading required package: RColorBrewer > readsSub <- readRDS("454reads.rds") > readsSub class: ShortReadQ length: 5460 reads; width: 5..424 cycles > trimTails(readsSub, 20, "5", successive=TRUE) class: ShortReadQ length: 5460 reads; width: 3..416 cycles > trimTails(readsSub, 20, "5", successive=TRUE) class: ShortReadQ length: 5460 reads; width: 3..416 cycles > trimTails(readsSub, 20, "5", successive=TRUE) class: ShortReadQ length: 5460 reads; width: 4..424 cycles > trimTails(readsSub, 20, "5", successive=TRUE) class: ShortReadQ length: 5460 reads; width: 5..416 cycles > trimTails(readsSub, 20, "5", successive=TRUE) class: ShortReadQ length: 5460 reads; width: 4..424 cycles > x = trimTails(readsSub, 20, "5", successive=TRUE) > y = trimTails(readsSub, 20, "5", successive=TRUE) > sum(width(x)!=width(y)) [1] 1325 > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.14.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 [4] Rsamtools_1.8.5 lattice_0.20-6 Biostrings_2.24.1 [7] GenomicRanges_1.8.9 IRanges_1.14.4 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.1 hwriter_1.3 stats4_2.15.1 [6] zlibbioc_1.2.0
Sequencing ShortRead Sequencing ShortRead • 1.1k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 3 days ago
United States
On 10/15/2012 5:34 AM, Zhenyu Xu wrote: > Hi ShortRead package developer, > > I tried to use the function trimTails to trim some bad quality bases from reads coming out of 454 sequencing machine. However I got different results if I run the command several times starting from the same ShortReadQ object and same trimming parameter. This is observed in centos linux machine (6.2 and 6.3). I also tried this with my own mac machine, but the results are identical. So seems the problem only restrict to centos linux machine (Not sure other linux platform has this problem or not). the data sets(~11Mb) can be downloaded at http://dl.dropbox.com/u/68829208/454reads.rds. Thank you for the bug report, data, and reproducible example. This has been fixed in ShortRead 1.16.1 and in the devel branch, and should be available via biocLite after about 10am Seattle time, tomorrow. The problem was only with successive=TRUE. Martin > best, > zhenyu > > Please see the following of the execution: > > wget http://dl.dropbox.com/u/68829208/454reads.rds > R > > R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows" > Copyright (C) 2012 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: x86_64-unknown-linux-gnu (64-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > Natural language support but running in an English locale > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > >> library(ShortRead) > Loading required package: BiocGenerics > > Attaching package: ?BiocGenerics? > > The following object(s) are masked from ?package:stats?: > > xtabs > > The following object(s) are masked from ?package:base?: > > anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, > get, intersect, lapply, Map, mapply, mget, order, paste, pmax, > pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, > rownames, sapply, setdiff, table, tapply, union, unique > > Loading required package: IRanges > Loading required package: GenomicRanges > Loading required package: Biostrings > Loading required package: lattice > Loading required package: Rsamtools > Loading required package: latticeExtra > Loading required package: RColorBrewer >> readsSub <- readRDS("454reads.rds") >> readsSub > class: ShortReadQ > length: 5460 reads; width: 5..424 cycles >> trimTails(readsSub, 20, "5", successive=TRUE) > class: ShortReadQ > length: 5460 reads; width: 3..416 cycles >> trimTails(readsSub, 20, "5", successive=TRUE) > class: ShortReadQ > length: 5460 reads; width: 3..416 cycles >> trimTails(readsSub, 20, "5", successive=TRUE) > class: ShortReadQ > length: 5460 reads; width: 4..424 cycles >> trimTails(readsSub, 20, "5", successive=TRUE) > class: ShortReadQ > length: 5460 reads; width: 5..416 cycles >> trimTails(readsSub, 20, "5", successive=TRUE) > class: ShortReadQ > length: 5460 reads; width: 4..424 cycles >> x = trimTails(readsSub, 20, "5", successive=TRUE) >> y = trimTails(readsSub, 20, "5", successive=TRUE) >> sum(width(x)!=width(y)) > [1] 1325 >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ShortRead_1.14.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 > [4] Rsamtools_1.8.5 lattice_0.20-6 Biostrings_2.24.1 > [7] GenomicRanges_1.8.9 IRanges_1.14.4 BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.1 hwriter_1.3 stats4_2.15.1 > [6] zlibbioc_1.2.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Dr. Martin Morgan, PhD Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
ADD COMMENT

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6