shortread quality
1
0
Entering edit mode
David ▴ 860
@david-3335
Last seen 6.0 years ago
Hi, I'm reading a fastq file from the solexa sequencer. I would like to know how many reads have a phred score (>Q29). The thing is that i get the densities so i don't really know how many reads from the total pass that filter. It's probaly easy for you so any hint would be helpful library("ShortRead") fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt" path = getwd() sp <- SolexaPath(path,dataPath=path,analysisPath=path) # Read fastq File and save report fq <- readFastq(sp, fqpattern) qaSummary <- qa(fq,fqpattern) save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" ))) report(qaSummary,dest="report") #Quality idx = which(qaSummary[["readQualityScore"]]["quality"] > 29) a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] , qaSummary[["readQualityScore"]][idx,"density"]) a #reads with a quality >Q29 #How to get the total number ? or percent compared to the total number of reads ? thanks
• 2.3k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States
Hi, On 06/06/2012 08:00 AM, David martin wrote: > Hi, > I'm reading a fastq file from the solexa sequencer. > I would like to know how many reads have a phred score (>Q29). The thing If you mean the average base quality score, then fq <- readFastq(sp, fqpattern) score <- alphabetScore(fq) gives the sum of the base quality scores for each read, so is a vector as long as the length of the reads. The average is aveScore <- score / width(fq) and then you're in the realm of familiar R again, e.g., hist(aveScore) table(aveScore > 29) etc. Hope that heps, Martin > is that i get the densities so i don't really know how many reads from > the total pass that filter. It's probaly easy for you so any hint would > be helpful > > library("ShortRead") > fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt" > > path = getwd() > sp <- SolexaPath(path,dataPath=path,analysisPath=path) > > # Read fastq File and save report > fq <- readFastq(sp, fqpattern) > qaSummary <- qa(fq,fqpattern) > save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" ))) > report(qaSummary,dest="report") > > #Quality > > idx = which(qaSummary[["readQualityScore"]]["quality"] > 29) > a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] , > qaSummary[["readQualityScore"]][idx,"density"]) > a #reads with a quality >Q29 > > #How to get the total number ? or percent compared to the total number > of reads ? > > thanks > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
On 06/06/2012 12:11 PM, Martin Morgan wrote: > Hi, > > On 06/06/2012 08:00 AM, David martin wrote: >> Hi, >> I'm reading a fastq file from the solexa sequencer. >> I would like to know how many reads have a phred score (>Q29). The thing > > If you mean the average base quality score, then > > fq <- readFastq(sp, fqpattern) > score <- alphabetScore(fq) > > gives the sum of the base quality scores for each read, so is a vector > as long as the length of the reads. The average is > > aveScore <- score / width(fq) > > and then you're in the realm of familiar R again, e.g., > > hist(aveScore) > table(aveScore > 29) > > etc. > > Hope that heps, I guess the qa object already gets you further, as you've indicated df <- qaSummary[["readQualityScore"]] the 'density' column (apparently not really a density) could be turned into a cumulative density cdensity <- cumsum(df$density) / sum(df$density) and then look up the cumulative density nearest the quality that you're interested in cdensity[findInterval(29, df$quality)] You'd want to do these steps separately for each lane, if there were several in df. Martin > > Martin > > > >> is that i get the densities so i don't really know how many reads from >> the total pass that filter. It's probaly easy for you so any hint would >> be helpful >> >> library("ShortRead") >> fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt" >> >> path = getwd() >> sp <- SolexaPath(path,dataPath=path,analysisPath=path) >> >> # Read fastq File and save report >> fq <- readFastq(sp, fqpattern) >> qaSummary <- qa(fq,fqpattern) >> save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" ))) >> report(qaSummary,dest="report") >> >> #Quality >> >> idx = which(qaSummary[["readQualityScore"]]["quality"] > 29) >> a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] , >> qaSummary[["readQualityScore"]][idx,"density"]) >> a #reads with a quality >Q29 >> >> #How to get the total number ? or percent compared to the total number >> of reads ? >> >> thanks >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
thanks Martin, That's exactly what i wanted to get !!! On 06/06/2012 10:34 PM, Martin Morgan wrote: > On 06/06/2012 12:11 PM, Martin Morgan wrote: >> Hi, >> >> On 06/06/2012 08:00 AM, David martin wrote: >>> Hi, >>> I'm reading a fastq file from the solexa sequencer. >>> I would like to know how many reads have a phred score (>Q29). The thing >> >> If you mean the average base quality score, then >> >> fq <- readFastq(sp, fqpattern) >> score <- alphabetScore(fq) >> >> gives the sum of the base quality scores for each read, so is a vector >> as long as the length of the reads. The average is >> >> aveScore <- score / width(fq) >> >> and then you're in the realm of familiar R again, e.g., >> >> hist(aveScore) >> table(aveScore > 29) >> >> etc. >> >> Hope that heps, > > I guess the qa object already gets you further, as you've indicated > > df <- qaSummary[["readQualityScore"]] > > the 'density' column (apparently not really a density) could be turned > into a cumulative density > > cdensity <- cumsum(df$density) / sum(df$density) > > and then look up the cumulative density nearest the quality that you're > interested in > > cdensity[findInterval(29, df$quality)] > > You'd want to do these steps separately for each lane, if there were > several in df. > > Martin > > > >> >> Martin >> >> >> >>> is that i get the densities so i don't really know how many reads from >>> the total pass that filter. It's probaly easy for you so any hint would >>> be helpful >>> >>> library("ShortRead") >>> fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt" >>> >>> path = getwd() >>> sp <- SolexaPath(path,dataPath=path,analysisPath=path) >>> >>> # Read fastq File and save report >>> fq <- readFastq(sp, fqpattern) >>> qaSummary <- qa(fq,fqpattern) >>> save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" ))) >>> report(qaSummary,dest="report") >>> >>> #Quality >>> >>> idx = which(qaSummary[["readQualityScore"]]["quality"] > 29) >>> a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] , >>> qaSummary[["readQualityScore"]][idx,"density"]) >>> a #reads with a quality >Q29 >>> >>> #How to get the total number ? or percent compared to the total number >>> of reads ? >>> >>> thanks >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >
ADD REPLY

Login before adding your answer.

Traffic: 718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6