Question: Q score filter to each base in QualityScaledDNAStringSet
0
gravatar for XIA.PAN
2.7 years ago by
XIA.PAN20
XIA.PAN20 wrote:

Dear all,

I have a QualityScaledDNAStingSet and want to filter out reads that have bases below a quality threshold (say 20 for instance). 

I can get the Q score one by one using as.integer(DNA_seqs@quality[1])

Is there a function to get the Q score of each object in the QualityScaledDNAStringSet and filter them with a threshold?

Thank you!

qualitycontrol • 447 views
ADD COMMENTlink modified 2.7 years ago by Mike Smith4.0k • written 2.7 years ago by XIA.PAN20
Answer: Q score filter to each base in QualityScaledDNAStringSet
2
gravatar for Mike Smith
2.7 years ago by
Mike Smith4.0k
EMBL Heidelberg / de.NBI
Mike Smith4.0k wrote:

You can do something like this.  First we'll create an example QualityScaledDNAStringSet using the example in the help page

library(Biostrings)
x1 <- DNAStringSet(c("TTGA", "CTCN"))
q1 <- PhredQuality(c("*+,-", "6789"))
qx1 <- QualityScaledDNAStringSet(x1, q1)

Then we'll create a list containing the integer versions of the quality scores.  This list has one entry for each sequence in our StringSet

quals_list <- as(quality(qx1), "IntegerList")
> quals_list
IntegerList of length 2
[[1]] 9 10 11 12
[[2]] 21 22 23 24

Now we apply a function to each of these vectors of quality scores.  In this case we are going to check if all of the entries in each vector are greater than 20, and return TRUE or FALSE.  You could put whatever function you want in here based on your criteria of 'good quality'.

good_quality <- sapply(quals_list, FUN = function(x) { 
        return(all(x >= 20))
    })

Finally we subset by this set of TRUE/FALSE values to keep only the good ones.

qx1_good <- qx1[good_quality]
> qx1_good
  A QualityScaledDNAStringSet instance containing:

  A DNAStringSet instance of length 1
    width seq
[1]     4 CTCN

  A PhredQuality instance of length 1
    width seq
[1]     4 6789

 

 

ADD COMMENTlink written 2.7 years ago by Mike Smith4.0k
1

It's much more efficient to work on the *List with single function calls

> all(as(quality(qx1), "IntegerList") >= 20)
[1] FALSE  TRUE

 

ADD REPLYlink written 2.7 years ago by Martin Morgan ♦♦ 24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 414 users visited in the last hour