Hi,
Is it possible to use the ShortRead::qa2() function from a ShortReadQ object rather than a fastq file? I'm doing various trimming and manipulation using ShortRead so since the data is in memory, I don't want to write it out to read it back in again. This is possible with qa() but for qa2 I can't quite get my head around the various QAData classes and how they all fit together. What would the alternative to QAFastqSource
be? Below is what I'm trying to do:
> dirPath <- system.file(package="ShortRead", "extdata", "E-MTAB-1147") > fls <- dir(dirPath, "fastq.gz", full=TRUE) > x1 <- qa(fls[1]) > head(x1[['perCycle']]$baseCall) Cycle Base Count lane 1 1 A 2248 ERR127302_1_subset.fastq.gz 2 1 C 10283 ERR127302_1_subset.fastq.gz 3 1 G 3105 ERR127302_1_subset.fastq.gz 4 1 T 4344 ERR127302_1_subset.fastq.gz 15 1 N 20 ERR127302_1_subset.fastq.gz 19 2 A 4965 ERR127302_1_subset.fastq.gz > head(x1[['perCycle']]$quality) Cycle Quality Score Count lane 4 1 # 2 5 ERR127302_1_subset.fastq.gz 6 1 % 4 2 ERR127302_1_subset.fastq.gz 7 1 & 5 9 ERR127302_1_subset.fastq.gz 8 1 ' 6 6 ERR127302_1_subset.fastq.gz 13 1 , 11 3 ERR127302_1_subset.fastq.gz 14 1 - 12 6 ERR127302_1_subset.fastq.gz > rfq <- yield(FastqSampler(fls[1], 1000000)) > x2 <- qa(rfq, lane='test') > head(x2[['perCycle']]$baseCall) Cycle Base Count lane 1 1 A 2248 test 2 1 C 10283 test 3 1 G 3105 test 4 1 T 4344 test 15 1 N 20 test 19 2 A 4965 test > head(x2[['perCycle']]$quality) Cycle Quality Score Count lane 4 1 # 2 5 test 6 1 % 4 2 test 7 1 & 5 9 test 8 1 ' 6 6 test 13 1 , 11 3 test 14 1 - 12 6 test > ShortRead:::.plotCycleBaseCall(x2[['perCycle']]$baseCall) > ShortRead:::.plotCycleQuality(x2[['perCycle']]$quality) Using qa2: > coll3 <- QACollate(QAFastqSource(fls[1]), + QANucleotideByCycle(), + QAQualityByCycle()) > x3 <- qa2(coll3, BPPARAM=SerialParam(), verbose=TRUE) qa2,QACollate-method qa2,QACollate1-method qa2,QAFastqSource-method qa2,FastqSampler-method qa2,QANucleotideByCycle-method qa2,QAQualityByCycle-method flag,QASource-method flag,ANY-method flag,ANY-method > x3@listData$QANucleotideByCycle@values DataFrame with 335 rows and 4 columns Id Cycle Base Count <factor> <integer> <factor> <integer> 1 1 1 A 2248 2 1 1 C 10283 3 1 1 G 3105 4 1 1 T 4344 5 1 1 N 20 ... ... ... ... ... 331 1 72 A 4428 332 1 72 C 5376 333 1 72 G 5721 334 1 72 T 4467 335 1 72 N 8 > x3@listData$QAQualityByCycle@values DataFrame with 2628 rows and 5 columns Id Cycle Quality Score Count <factor> <integer> <factor> <integer> <integer> 1 1 1 # 2 5 2 1 1 % 4 2 3 1 1 & 5 9 4 1 1 ' 6 6 5 1 1 , 11 3 ... ... ... ... ... ... 2624 1 72 E 36 1181 2625 1 72 F 37 763 2626 1 72 G 38 1672 2627 1 72 H 39 1673 2628 1 72 I 40 1398 > coll4 <- QACollate(QAData(rfq), + QANucleotideByCycle(), + QAQualityByCycle()) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘QACollate’ for signature ‘"QAData"’
Thanks,
Phil
Great thanks for the clarification and hints Martin.