Question

ShortRead qa() error: Error in density.default(qscore) : 'x' contains missing values

0

Entering edit mode

Sam McInturf ▴ 300

@sam-mcinturf-5291

Last seen 8.6 years ago

United States

Hello everyone, I am using ShortRead to do some pre-processing for 100bp Illumina reads. I used qa() to read to do an upfront analysis, and then used trimEnds() to trim my reads by quality, and now I would like to redo the qa() to see a 'before and after' type of thing. After running it I get this error: > qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), names(fls)[i]), fls) Error in density.default(qscore) : 'x' contains missing values Calls: lapply ... .qa_qdensity -> density -> density -> density.default Execution halted I ran the first 100 reads of one of the files and it worked just fine. My scripts look like this: To trim the reads the first few lines are just parsing to get a nice looking name for the 'newfls' oldfls <- c("/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L005_R1_001.fas tq", "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L006_R1_001.fastq ", "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L007_R1_001.fastq ") flsSplit <- strsplit(oldfls, "/") FileSplit <- unlist(lapply(flsSplit, function(x) paste(x[1:6], collapse = "/"))) readFiles <- unlist(lapply(flsSplit, function(x) x[7])) newfls <- paste("/scratch/smcintur/Sample_CRT1/",unlist(lapply(strsplit(readFile s, "_"), function(x) paste(x[1], "Clean", x[3], sep = "_"))),sep ="") newfls <- paste(newfls, ".fastq", sep = "") reads <- readFastq(oldfls[1]) treads <- trimEnds(reads, Cutoff) rm(reads) writeFastq(treads, file = newfls[1]) rm(treads) and then the same thing for the following two reads. And then to check quality again, parsing up front for pretty names, and then a qa() call. Note that this script worked just fine to process the reads before they were library(ShortRead) fls <- list.files(pattern = ".fastq") flsSplit <- strsplit(fls, split = "_") Lib <- lapply(flsSplit, function(x) x[1]) Lane <- lapply(flsSplit, function(x) x[3]) names(fls) <- paste(Lib, Lane, sep = "_") names(fls) <- gsub(".fastq", "", names(fls)) outname <- paste("QA_", Lib[1], ".rda", sep = "") qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), names(fls)[i]), fls) QA <- do.call(rbind, qas) save(QA, file = outname) Does anyone have an idea of what is going on? > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] Biobase_2.20.0 bitops_1.0-5 grid_3.0.0 hwriter_1.3 stats4_3.0.0 [6] zlibbioc_1.6.0 -- Sam McInturf [[alternative HTML version deleted]]

PROcess ShortRead PROcess ShortRead • 3.1k views

ADD COMMENT • link updated 10.4 years ago by Martin Morgan 25k • written 10.4 years ago by Sam McInturf ▴ 300

score 0 · Answer 1 · 2013-11-19

On 11/19/2013 08:11 PM, Sam McInturf wrote: > Hello everyone, > I am using ShortRead to do some pre-processing for 100bp Illumina reads. > I used qa() to read to do an upfront analysis, and then used trimEnds() to I'd guess that trimEnds actually trims the ends of some reads to zero length; these are retained by default, whereas probably you'd like to drop them. I'll work on a bug fix into the devel branch, but a workaround might be to... > trim my reads by quality, and now I would like to redo the qa() to see a > 'before and after' type of thing. After running it I get this error: > >> qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), > names(fls)[i]), fls) only call qa on reads with width != 0, rfq = readFastq(fls[i]); qa(rfq[width(rfq) != 0]) If this is for 10's of millions of reads a better strategy might be to run qa on the files directly (this might not help in this particular situation) qa(directory_for_fastq, pattern_defining_files, type="fastq") or rfq = yield(FastqSampler(fls[i])); qa(rfq[width(rfq) != 0]) which will draw a random sample rather than read all reads into memory. Martin > Error in density.default(qscore) : 'x' contains missing values > Calls: lapply ... .qa_qdensity -> density -> density -> density.default > Execution halted > > I ran the first 100 reads of one of the files and it worked just fine. > > My scripts look like this: > To trim the reads > the first few lines are just parsing to get a nice looking name for the > 'newfls' > > oldfls <- > c("/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L005_R1_001.f astq", > "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L006_R1_001.fas tq", > "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L007_R1_001.fas tq") > flsSplit <- strsplit(oldfls, "/") > FileSplit <- unlist(lapply(flsSplit, function(x) paste(x[1:6], collapse = > "/"))) > readFiles <- unlist(lapply(flsSplit, function(x) x[7])) > newfls <- > paste("/scratch/smcintur/Sample_CRT1/",unlist(lapply(strsplit(readFi les, > "_"), > function(x) paste(x[1], "Clean", x[3], > sep = "_"))),sep ="") > newfls <- paste(newfls, ".fastq", sep = "") > > reads <- readFastq(oldfls[1]) > treads <- trimEnds(reads, Cutoff) > rm(reads) > writeFastq(treads, file = newfls[1]) > rm(treads) > > and then the same thing for the following two reads. > > > And then to check quality > again, parsing up front for pretty names, and then a qa() call. > Note that this script worked just fine to process the reads before they were > > library(ShortRead) > fls <- list.files(pattern = ".fastq") > flsSplit <- strsplit(fls, split = "_") > Lib <- lapply(flsSplit, function(x) x[1]) > Lane <- lapply(flsSplit, function(x) x[3]) > names(fls) <- paste(Lib, Lane, sep = "_") > names(fls) <- gsub(".fastq", "", names(fls)) > outname <- paste("QA_", Lib[1], ".rda", sep = "") > > qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]), > names(fls)[i]), fls) > QA <- do.call(rbind, qas) > save(QA, file = outname) > > Does anyone have an idea of what is going on? > > > > >> sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 > [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 > [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.20.0 bitops_1.0-5 grid_3.0.0 hwriter_1.3 > stats4_3.0.0 > [6] zlibbioc_1.6.0 > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793