ShortRead quality assessment functions (qa and qa2) from ShortReadQ object
1
0
Entering edit mode
phil.chapman ▴ 150
@philchapman-8324
Last seen 7.6 years ago
United Kingdom

Hi,

Is it possible to use the ShortRead::qa2() function from a ShortReadQ object rather than a fastq file?  I'm doing various trimming and manipulation using ShortRead so since the data is in memory, I don't want to write it out to read it back in again.  This is possible with qa() but for  qa2 I can't quite get my head around the various QAData classes and how they all fit together.  What would the alternative to QAFastqSource be?  Below is what I'm trying to do:

> dirPath <- system.file(package="ShortRead", "extdata", "E-MTAB-1147")
> fls <- dir(dirPath, "fastq.gz", full=TRUE)
> x1 <- qa(fls[1])
> head(x1[['perCycle']]$baseCall)
   Cycle Base Count                        lane
1      1    A  2248 ERR127302_1_subset.fastq.gz
2      1    C 10283 ERR127302_1_subset.fastq.gz
3      1    G  3105 ERR127302_1_subset.fastq.gz
4      1    T  4344 ERR127302_1_subset.fastq.gz
15     1    N    20 ERR127302_1_subset.fastq.gz
19     2    A  4965 ERR127302_1_subset.fastq.gz
> head(x1[['perCycle']]$quality)
   Cycle Quality Score Count                        lane
4      1       #     2     5 ERR127302_1_subset.fastq.gz
6      1       %     4     2 ERR127302_1_subset.fastq.gz
7      1       &     5     9 ERR127302_1_subset.fastq.gz
8      1       '     6     6 ERR127302_1_subset.fastq.gz
13     1       ,    11     3 ERR127302_1_subset.fastq.gz
14     1       -    12     6 ERR127302_1_subset.fastq.gz
> rfq <- yield(FastqSampler(fls[1], 1000000))
> x2 <- qa(rfq, lane='test')
> head(x2[['perCycle']]$baseCall)
   Cycle Base Count lane
1      1    A  2248 test
2      1    C 10283 test
3      1    G  3105 test
4      1    T  4344 test
15     1    N    20 test
19     2    A  4965 test
> head(x2[['perCycle']]$quality)
   Cycle Quality Score Count lane
4      1       #     2     5 test
6      1       %     4     2 test
7      1       &     5     9 test
8      1       '     6     6 test
13     1       ,    11     3 test
14     1       -    12     6 test
> ShortRead:::.plotCycleBaseCall(x2[['perCycle']]$baseCall)
> ShortRead:::.plotCycleQuality(x2[['perCycle']]$quality)

Using qa2:
> coll3 <- QACollate(QAFastqSource(fls[1]),
+ QANucleotideByCycle(),
+ QAQualityByCycle())
> x3 <- qa2(coll3, BPPARAM=SerialParam(), verbose=TRUE)
qa2,QACollate-method
qa2,QACollate1-method
qa2,QAFastqSource-method
qa2,FastqSampler-method
qa2,QANucleotideByCycle-method
qa2,QAQualityByCycle-method
flag,QASource-method
flag,ANY-method
flag,ANY-method
> x3@listData$QANucleotideByCycle@values
DataFrame with 335 rows and 4 columns
          Id     Cycle     Base     Count
    <factor> <integer> <factor> <integer>
1          1         1        A      2248
2          1         1        C     10283
3          1         1        G      3105
4          1         1        T      4344
5          1         1        N        20
...      ...       ...      ...       ...
331        1        72        A      4428
332        1        72        C      5376
333        1        72        G      5721
334        1        72        T      4467
335        1        72        N         8
> x3@listData$QAQualityByCycle@values
DataFrame with 2628 rows and 5 columns
           Id     Cycle  Quality     Score     Count
     <factor> <integer> <factor> <integer> <integer>
1           1         1        #         2         5
2           1         1        %         4         2
3           1         1        &         5         9
4           1         1        '         6         6
5           1         1        ,        11         3
...       ...       ...      ...       ...       ...
2624        1        72        E        36      1181
2625        1        72        F        37       763
2626        1        72        G        38      1672
2627        1        72        H        39      1673
2628        1        72        I        40      1398
> coll4 <- QACollate(QAData(rfq),
+ QANucleotideByCycle(),
+ QAQualityByCycle())
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘QACollate’ for signature ‘"QAData"’

 

Thanks,

Phil

shortread • 1.7k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States

Unfortunately, qa2 was really designed with file-based input in mind, sorry.

Instead of direct slot access, note that QA extends 'SimpleList', which is list-like so

> names(x3)
[1] "QANucleotideByCycle" "QAQualityByCycle"   
> x3[["QANucleotideByCycle"]]
class: QANucleotideByCycle
html template: /home/mtmorgan/R/x86_64-pc-linux-gnu-.../QANucleotideByCycle.html
useFilter: TRUE; addFilter: TRUE

and that

> methods(class=class(x3[["QANucleotideByCycle"]]))
[1] qa2      rbind    report   show     values<- values  
see '?methods' for accessing help and source code

so

> values(x3[["QANucleotideByCycle"]])
DataFrame with 335 rows and 4 columns
          Id     Cycle     Base     Count
    <factor> <integer> <factor> <integer>
1          1         1        A      2248
2          1         1        C     10283
3          1         1        G      3105
4          1         1        T      4344
5          1         1        N        20
...      ...       ...      ...       ...
331        1        72        A      4428
332        1        72        C      5376
333        1        72        G      5721
334        1        72        T      4467
335        1        72        N         8

 

 

 

ADD COMMENT
0
Entering edit mode

Great thanks for the clarification and hints Martin.

ADD REPLY

Login before adding your answer.

Traffic: 940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6