Search
Question: ShortRead quality assessment functions (qa and qa2) from ShortReadQ object
0
gravatar for phil.chapman
19 months ago by
phil.chapman120
United Kingdom
phil.chapman120 wrote:

Hi,

Is it possible to use the ShortRead::qa2() function from a ShortReadQ object rather than a fastq file?  I'm doing various trimming and manipulation using ShortRead so since the data is in memory, I don't want to write it out to read it back in again.  This is possible with qa() but for  qa2 I can't quite get my head around the various QAData classes and how they all fit together.  What would the alternative to QAFastqSource be?  Below is what I'm trying to do:

> dirPath <- system.file(package="ShortRead", "extdata", "E-MTAB-1147")
> fls <- dir(dirPath, "fastq.gz", full=TRUE)
> x1 <- qa(fls[1])
> head(x1[['perCycle']]$baseCall)
   Cycle Base Count                        lane
1      1    A  2248 ERR127302_1_subset.fastq.gz
2      1    C 10283 ERR127302_1_subset.fastq.gz
3      1    G  3105 ERR127302_1_subset.fastq.gz
4      1    T  4344 ERR127302_1_subset.fastq.gz
15     1    N    20 ERR127302_1_subset.fastq.gz
19     2    A  4965 ERR127302_1_subset.fastq.gz
> head(x1[['perCycle']]$quality)
   Cycle Quality Score Count                        lane
4      1       #     2     5 ERR127302_1_subset.fastq.gz
6      1       %     4     2 ERR127302_1_subset.fastq.gz
7      1       &     5     9 ERR127302_1_subset.fastq.gz
8      1       '     6     6 ERR127302_1_subset.fastq.gz
13     1       ,    11     3 ERR127302_1_subset.fastq.gz
14     1       -    12     6 ERR127302_1_subset.fastq.gz
> rfq <- yield(FastqSampler(fls[1], 1000000))
> x2 <- qa(rfq, lane='test')
> head(x2[['perCycle']]$baseCall)
   Cycle Base Count lane
1      1    A  2248 test
2      1    C 10283 test
3      1    G  3105 test
4      1    T  4344 test
15     1    N    20 test
19     2    A  4965 test
> head(x2[['perCycle']]$quality)
   Cycle Quality Score Count lane
4      1       #     2     5 test
6      1       %     4     2 test
7      1       &     5     9 test
8      1       '     6     6 test
13     1       ,    11     3 test
14     1       -    12     6 test
> ShortRead:::.plotCycleBaseCall(x2[['perCycle']]$baseCall)
> ShortRead:::.plotCycleQuality(x2[['perCycle']]$quality)

Using qa2:
> coll3 <- QACollate(QAFastqSource(fls[1]),
+ QANucleotideByCycle(),
+ QAQualityByCycle())
> x3 <- qa2(coll3, BPPARAM=SerialParam(), verbose=TRUE)
qa2,QACollate-method
qa2,QACollate1-method
qa2,QAFastqSource-method
qa2,FastqSampler-method
qa2,QANucleotideByCycle-method
qa2,QAQualityByCycle-method
flag,QASource-method
flag,ANY-method
flag,ANY-method
> x3@listData$QANucleotideByCycle@values
DataFrame with 335 rows and 4 columns
          Id     Cycle     Base     Count
    <factor> <integer> <factor> <integer>
1          1         1        A      2248
2          1         1        C     10283
3          1         1        G      3105
4          1         1        T      4344
5          1         1        N        20
...      ...       ...      ...       ...
331        1        72        A      4428
332        1        72        C      5376
333        1        72        G      5721
334        1        72        T      4467
335        1        72        N         8
> x3@listData$QAQualityByCycle@values
DataFrame with 2628 rows and 5 columns
           Id     Cycle  Quality     Score     Count
     <factor> <integer> <factor> <integer> <integer>
1           1         1        #         2         5
2           1         1        %         4         2
3           1         1        &         5         9
4           1         1        '         6         6
5           1         1        ,        11         3
...       ...       ...      ...       ...       ...
2624        1        72        E        36      1181
2625        1        72        F        37       763
2626        1        72        G        38      1672
2627        1        72        H        39      1673
2628        1        72        I        40      1398
> coll4 <- QACollate(QAData(rfq),
+ QANucleotideByCycle(),
+ QAQualityByCycle())
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘QACollate’ for signature ‘"QAData"’

 

Thanks,

Phil

ADD COMMENTlink modified 19 months ago by Martin Morgan ♦♦ 20k • written 19 months ago by phil.chapman120
1
gravatar for Martin Morgan
19 months ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

Unfortunately, qa2 was really designed with file-based input in mind, sorry.

Instead of direct slot access, note that QA extends 'SimpleList', which is list-like so

> names(x3)
[1] "QANucleotideByCycle" "QAQualityByCycle"   
> x3[["QANucleotideByCycle"]]
class: QANucleotideByCycle
html template: /home/mtmorgan/R/x86_64-pc-linux-gnu-.../QANucleotideByCycle.html
useFilter: TRUE; addFilter: TRUE

and that

> methods(class=class(x3[["QANucleotideByCycle"]]))
[1] qa2      rbind    report   show     values<- values  
see '?methods' for accessing help and source code

so

> values(x3[["QANucleotideByCycle"]])
DataFrame with 335 rows and 4 columns
          Id     Cycle     Base     Count
    <factor> <integer> <factor> <integer>
1          1         1        A      2248
2          1         1        C     10283
3          1         1        G      3105
4          1         1        T      4344
5          1         1        N        20
...      ...       ...      ...       ...
331        1        72        A      4428
332        1        72        C      5376
333        1        72        G      5721
334        1        72        T      4467
335        1        72        N         8

 

 

 

ADD COMMENTlink written 19 months ago by Martin Morgan ♦♦ 20k

Great thanks for the clarification and hints Martin.

ADD REPLYlink written 19 months ago by phil.chapman120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 377 users visited in the last hour