ShortRead qa() error: Error in density.default(qscore) : 'x' contains missing values
1
0
Entering edit mode
Sam McInturf ▴ 300
@sam-mcinturf-5291
Last seen 8.6 years ago
United States
Hello everyone, I am using ShortRead to do some pre-processing for 100bp Illumina reads. I used qa() to read to do an upfront analysis, and then used trimEnds() to trim my reads by quality, and now I would like to redo the qa() to see a 'before and after' type of thing. After running it I get this error: > qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), names(fls)[i]), fls) Error in density.default(qscore) : 'x' contains missing values Calls: lapply ... .qa_qdensity -> density -> density -> density.default Execution halted I ran the first 100 reads of one of the files and it worked just fine. My scripts look like this: To trim the reads the first few lines are just parsing to get a nice looking name for the 'newfls' oldfls <- c("/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L005_R1_001.fas tq", "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L006_R1_001.fastq ", "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L007_R1_001.fastq ") flsSplit <- strsplit(oldfls, "/") FileSplit <- unlist(lapply(flsSplit, function(x) paste(x[1:6], collapse = "/"))) readFiles <- unlist(lapply(flsSplit, function(x) x[7])) newfls <- paste("/scratch/smcintur/Sample_CRT1/",unlist(lapply(strsplit(readFile s, "_"), function(x) paste(x[1], "Clean", x[3], sep = "_"))),sep ="") newfls <- paste(newfls, ".fastq", sep = "") reads <- readFastq(oldfls[1]) treads <- trimEnds(reads, Cutoff) rm(reads) writeFastq(treads, file = newfls[1]) rm(treads) and then the same thing for the following two reads. And then to check quality again, parsing up front for pretty names, and then a qa() call. Note that this script worked just fine to process the reads before they were library(ShortRead) fls <- list.files(pattern = ".fastq") flsSplit <- strsplit(fls, split = "_") Lib <- lapply(flsSplit, function(x) x[1]) Lane <- lapply(flsSplit, function(x) x[3]) names(fls) <- paste(Lib, Lane, sep = "_") names(fls) <- gsub(".fastq", "", names(fls)) outname <- paste("QA_", Lib[1], ".rda", sep = "") qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), names(fls)[i]), fls) QA <- do.call(rbind, qas) save(QA, file = outname) Does anyone have an idea of what is going on? > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] Biobase_2.20.0 bitops_1.0-5 grid_3.0.0 hwriter_1.3 stats4_3.0.0 [6] zlibbioc_1.6.0 -- Sam McInturf [[alternative HTML version deleted]]
PROcess ShortRead PROcess ShortRead • 3.1k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States
On 11/19/2013 08:11 PM, Sam McInturf wrote: > Hello everyone, > I am using ShortRead to do some pre-processing for 100bp Illumina reads. > I used qa() to read to do an upfront analysis, and then used trimEnds() to I'd guess that trimEnds actually trims the ends of some reads to zero length; these are retained by default, whereas probably you'd like to drop them. I'll work on a bug fix into the devel branch, but a workaround might be to... > trim my reads by quality, and now I would like to redo the qa() to see a > 'before and after' type of thing. After running it I get this error: > >> qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), > names(fls)[i]), fls) only call qa on reads with width != 0, rfq = readFastq(fls[i]); qa(rfq[width(rfq) != 0]) If this is for 10's of millions of reads a better strategy might be to run qa on the files directly (this might not help in this particular situation) qa(directory_for_fastq, pattern_defining_files, type="fastq") or rfq = yield(FastqSampler(fls[i])); qa(rfq[width(rfq) != 0]) which will draw a random sample rather than read all reads into memory. Martin > Error in density.default(qscore) : 'x' contains missing values > Calls: lapply ... .qa_qdensity -> density -> density -> density.default > Execution halted > > I ran the first 100 reads of one of the files and it worked just fine. > > My scripts look like this: > To trim the reads > the first few lines are just parsing to get a nice looking name for the > 'newfls' > > oldfls <- > c("/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L005_R1_001.f astq", > "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L006_R1_001.fas tq", > "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L007_R1_001.fas tq") > flsSplit <- strsplit(oldfls, "/") > FileSplit <- unlist(lapply(flsSplit, function(x) paste(x[1:6], collapse = > "/"))) > readFiles <- unlist(lapply(flsSplit, function(x) x[7])) > newfls <- > paste("/scratch/smcintur/Sample_CRT1/",unlist(lapply(strsplit(readFi les, > "_"), > function(x) paste(x[1], "Clean", x[3], > sep = "_"))),sep ="") > newfls <- paste(newfls, ".fastq", sep = "") > > reads <- readFastq(oldfls[1]) > treads <- trimEnds(reads, Cutoff) > rm(reads) > writeFastq(treads, file = newfls[1]) > rm(treads) > > and then the same thing for the following two reads. > > > And then to check quality > again, parsing up front for pretty names, and then a qa() call. > Note that this script worked just fine to process the reads before they were > > library(ShortRead) > fls <- list.files(pattern = ".fastq") > flsSplit <- strsplit(fls, split = "_") > Lib <- lapply(flsSplit, function(x) x[1]) > Lane <- lapply(flsSplit, function(x) x[3]) > names(fls) <- paste(Lib, Lane, sep = "_") > names(fls) <- gsub(".fastq", "", names(fls)) > outname <- paste("QA_", Lib[1], ".rda", sep = "") > > qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), > names(fls)[i]), fls) > QA <- do.call(rbind, qas) > save(QA, file = outname) > > Does anyone have an idea of what is going on? > > > > >> sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 > [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 > [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.20.0 bitops_1.0-5 grid_3.0.0 hwriter_1.3 > stats4_3.0.0 > [6] zlibbioc_1.6.0 > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
D'oh! What I forgot to put in the first script is a line like: treads <- treads[width(treads) > 25] because Tophat doesn't like the reads less than 25. In turn that would also remove reads of length zero, hence why I have never seen this error. Thank you so kindly On Tue, Nov 19, 2013 at 10:34 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 11/19/2013 08:11 PM, Sam McInturf wrote: > >> Hello everyone, >> I am using ShortRead to do some pre-processing for 100bp Illumina >> reads. >> I used qa() to read to do an upfront analysis, and then used trimEnds() >> to >> > > I'd guess that trimEnds actually trims the ends of some reads to zero > length; these are retained by default, whereas probably you'd like to drop > them. I'll work on a bug fix into the devel branch, but a workaround might > be to... > > > trim my reads by quality, and now I would like to redo the qa() to see a >> 'before and after' type of thing. After running it I get this error: >> >> qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), >>> >> names(fls)[i]), fls) >> > > only call qa on reads with width != 0, rfq = readFastq(fls[i]); > qa(rfq[width(rfq) != 0]) > > If this is for 10's of millions of reads a better strategy might be to run > qa on the files directly (this might not help in this particular situation) > > qa(directory_for_fastq, pattern_defining_files, type="fastq") > > or rfq = yield(FastqSampler(fls[i])); qa(rfq[width(rfq) != 0]) > > which will draw a random sample rather than read all reads into memory. > > Martin > > > Error in density.default(qscore) : 'x' contains missing values >> Calls: lapply ... .qa_qdensity -> density -> density -> density.default >> Execution halted >> >> I ran the first 100 reads of one of the files and it worked just fine. >> >> My scripts look like this: >> To trim the reads >> the first few lines are just parsing to get a nice looking name for the >> 'newfls' >> >> oldfls <- >> c("/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_ >> L005_R1_001.fastq", >> "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L006_R1_001.fa stq", >> "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L007_R1_001.fa stq") >> flsSplit <- strsplit(oldfls, "/") >> FileSplit <- unlist(lapply(flsSplit, function(x) paste(x[1:6], collapse = >> "/"))) >> readFiles <- unlist(lapply(flsSplit, function(x) x[7])) >> newfls <- >> paste("/scratch/smcintur/Sample_CRT1/",unlist(lapply(strsplit(readF iles, >> "_"), >> function(x) paste(x[1], "Clean", x[3], >> sep = "_"))),sep ="") >> newfls <- paste(newfls, ".fastq", sep = "") >> >> reads <- readFastq(oldfls[1]) >> treads <- trimEnds(reads, Cutoff) >> rm(reads) >> writeFastq(treads, file = newfls[1]) >> rm(treads) >> >> and then the same thing for the following two reads. >> >> >> And then to check quality >> again, parsing up front for pretty names, and then a qa() call. >> Note that this script worked just fine to process the reads before they >> were >> >> library(ShortRead) >> fls <- list.files(pattern = ".fastq") >> flsSplit <- strsplit(fls, split = "_") >> Lib <- lapply(flsSplit, function(x) x[1]) >> Lane <- lapply(flsSplit, function(x) x[3]) >> names(fls) <- paste(Lib, Lane, sep = "_") >> names(fls) <- gsub(".fastq", "", names(fls)) >> outname <- paste("QA_", Lib[1], ".rda", sep = "") >> >> qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), >> names(fls)[i]), fls) >> QA <- do.call(rbind, qas) >> save(QA, file = outname) >> >> Does anyone have an idea of what is going on? >> >> >> >> >> sessionInfo() >>> >> R version 3.0.0 (2013-04-03) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 >> [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 >> [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.20.0 bitops_1.0-5 grid_3.0.0 hwriter_1.3 >> stats4_3.0.0 >> [6] zlibbioc_1.6.0 >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > -- Sam McInturf [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6