Illumina QC using ShortRead
1
0
Entering edit mode
@davide-cittaro-3332
Last seen 9.6 years ago
Hi all, in order to produce some QC report we currently use the ShortRead qa pipeline, like this: qual<-qa(PATH_TO_GERALD_DIRECTORY, pattern=".*export.txt",type=c("SolexaExport")) We produce eland alignments only for this purpose and it takes so much time. Also, the QA process is taking more and more time as the throughput of Illumina machine increases... We've been trying to run with BAM files (which is the default for us, as we use bwa only), but we can't get the same report with same sections... Which is the fastest way to have decent QC without the Illumina pipeline? THanks d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */ [[alternative HTML version deleted]]
PROcess PROcess • 946 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 days ago
United States
On 10/07/2010 10:32 AM, Davide Cittaro wrote: > Hi all, in order to produce some QC report we currently use the > ShortRead qa pipeline, like this: > > qual<-qa(PATH_TO_GERALD_DIRECTORY, > pattern=".*export.txt",type=c("SolexaExport")) > > We produce eland alignments only for this purpose and it takes so > much time. Also, the QA process is taking more and more time as the > throughput of Illumina machine increases... We've been trying to run > with BAM files (which is the default for us, as we use bwa only), but > we can't get the same report with same sections... Which is the > fastest way to have decent QC without the Illumina pipeline? > > THanks Hi Davide -- use readAligned, type="BAM", and then do a qa report on that. Do this separately for each file, and combine the qa objects with rbind. Here's one version library(ShortRead) fl <- system.file("extdata", "ex1.bam", package="Rsamtools") fls <- c(fl, fl, fl) qa <- do.call(rbind, Map(function(fl, id, ...) { aln <- readAligned(fl, type="BAM", ...) qa(aln, id) }, fls, paste(basename(fls), seq_along(fls), sep="-"))) Consider using ScanBamParam() (from Rsamtools). This is more directly supported in the devel version of ShortRead. Maybe this doesn't get the same qa report sections precisely; which information are you looking for? Martin > > > d > /* > Davide Cittaro > > Cogentech - Consortium for Genomic Technologies > via adamello, 16 > 20139 Milano > Italy > > tel.: +39(02)574303007 > e-mail: davide.cittaro at ifom-ieo-campus.it > */ > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
Hi Martin, On Oct 7, 2010, at 7:44 PM, Martin Morgan wrote: > > library(ShortRead) > fl <- system.file("extdata", "ex1.bam", package="Rsamtools") > fls <- c(fl, fl, fl) > > qa <- do.call(rbind, Map(function(fl, id, ...) { > aln <- readAligned(fl, type="BAM", ...) > qa(aln, id) > }, fls, paste(basename(fls), seq_along(fls), sep="-"))) > I'm not familiar with R errors but what about this? Error: Input/Output 'readAligned' failed to parse files dirPath: './100927_s_1.bam' pattern: '' type: 'BAM' error: INTEGER() can only be applied to a 'integer', not a 'special' Also, note that I've loaded files like this: fls <- list.files(".", pattern="100927_", full.names=TRUE) d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */ [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On 10/08/2010 01:42 AM, Davide Cittaro wrote: > Hi Martin, > > On Oct 7, 2010, at 7:44 PM, Martin Morgan wrote: >> >> library(ShortRead) >> fl <- system.file("extdata", "ex1.bam", package="Rsamtools") >> fls <- c(fl, fl, fl) >> >> qa <- do.call(rbind, Map(function(fl, id, ...) { >> aln <- readAligned(fl, type="BAM", ...) >> qa(aln, id) >> }, fls, paste(basename(fls), seq_along(fls), sep="-"))) >> > > I'm not familiar with R errors but what about this? > > Error: Input/Output > 'readAligned' failed to parse files > dirPath: './100927_s_1.bam' > pattern: '' > type: 'BAM' > error: INTEGER() can only be applied to a 'integer', not a 'special' It is difficult to know the details, but likely involves either incorrect arguments passed to a C-level function in Rsamtools or a corrupt or otherwsie unexpected BAM file. Can you (a) try to read the bam file directly using param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE, what = ShortRead:::.readAligned_bamWhat()) res = scanBam('./100927_s_1.bam', param=param) I think this will fail, and then traceback() might provide useful (to me, anyway) output. Also, please provide the output of sessionInfo(). Here's mine > sessionInfo() R version 2.11.1 Patched (2010-08-30 r52862) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.6.2 Rsamtools_1.0.8 lattice_0.19-11 [4] Biostrings_2.16.9 GenomicRanges_1.0.9 IRanges_1.6.18 loaded via a namespace (and not attached): [1] Biobase_2.8.0 grid_2.11.1 hwriter_1.2 Thanks, Martin > > Also, note that I've loaded files like this: > > fls <- list.files(".", pattern="100927_", full.names=TRUE) > > d > > /* > Davide Cittaro > > Cogentech - Consortium for Genomic Technologies > via adamello, 16 > 20139 Milano > Italy > > tel.: +39(02)574303007 > e-mail: davide.cittaro at ifom-ieo-campus.it > <mailto:davide.cittaro at="" ifom-ieo-campus.it=""> > */ > > > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 1087 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6