Search
Question: ShortRead for sequencing data quality assessment
0
gravatar for heyi xiao
7.2 years ago by
heyi xiao360
United States
heyi xiao360 wrote:
Dear Martin, I used the qa function ShortRead package for the solexa sequencing data quality assessment. And I got some result in a FastqQA object below. > qa1 class: FastqQA(9) QA elements (access with qa[["elt"]]):   readCounts: data.frame(1 3)   baseCalls: data.frame(1 5)   readQualityScore: data.frame(512 4)   baseQuality: data.frame(94 3)   alignQuality: data.frame(1 3)   frequentSequences: data.frame(50 4)   sequenceDistribution: data.frame(5 4)   perCycle: list(2)     baseCall: data.frame(157 4)     quality: data.frame(1070 5)   perTile: list(2)     readCounts: data.frame(0 4) medianReadQualityScore: data.frame(0 4)  But a few things are not that clear to me. What is readQualityScore, and what are the quality and density columns in it? I have 1000 reads in my example file, but there are only 512 rows. I try to generated report as suggested by the vignette, but failed in the follow ways:  > rpt <- report(qa1)      # Create report Error in X11(paste("jpeg::", quality, ":", filename, sep = ""), width,  :   unable to start device JPEG In addition: Warning message: In jpeg(file.path(imgDir, jpegFile), ...) :   no jpeg support in this version of R > rpt <- report(qa1, type='pdf')      # Create report Error: UserArgumentMismatch   'report, type="pdf"' not implemented for class 'FastqQA'  I am working on a Lunix server remotely, I don’t really have X11 access. Unfortunately, the pdf method is not supported somehow. Is there any way that I can still generate the report? Heyi [[alternative HTML version deleted]]
ADD COMMENTlink modified 7.2 years ago • written 7.2 years ago by heyi xiao360
0
gravatar for Valerie Obenchain
7.2 years ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:
Hi Heyi, The read quality tells us about the quality of each short read in the general sense of "sum of quality scores over one short read" / "width of the short read". The density is a measurement of the dispersed mass of the distribution of the input. The input to the density calculation is the normalized quality score I mention above. See ?density for information on the algorithms used and why the length of the readQualityScore dataframe is 512. What version of R and ShortRead are you using? It is helpful if you provide this information by calling sessionInfo() at the end of your code when submitting a question. In R-devel (2.12) with ShortRead 1.7.24 this piece of code woks for me. Please let me know if you can run it. # perform qa on a fastq file provided with ShortRead exptPath <- system.file("extdata", package = "ShortRead") sp <- SolexaPath(exptPath) res <- qa(dirPath=analysisPath(sp), pattern="s_1_sequence.txt", type="fastq") > res class: FastqQA(9) QA elements (access with qa[["elt"]]): readCounts: data.frame(1 3) baseCalls: data.frame(1 5) readQualityScore: data.frame(512 4) baseQuality: data.frame(94 3) alignQuality: data.frame(1 3) frequentSequences: data.frame(50 4) sequenceDistribution: data.frame(3 4) perCycle: list(2) baseCall: data.frame(141 4) quality: data.frame(341 5) perTile: list(2) readCounts: data.frame(0 4) medianReadQualityScore: data.frame(0 4) > report(res) [1] "/tmp/RtmpCq1M5h/file8edbdab/index.html" Valerie On 09/24/2010 01:04 PM, heyi xiao wrote: > > > > Dear > Martin, > > I used > the qa function ShortRead package for the solexa sequencing data quality assessment. > And I got some result in a FastqQA object below. > > >> > qa1 > > class: > FastqQA(9) > > QA > elements (access with qa[["elt"]]): > >  readCounts: data.frame(1 3) > >  baseCalls: data.frame(1 5) > >  readQualityScore: data.frame(512 4) > >  baseQuality: data.frame(94 3) > >  alignQuality: data.frame(1 3) > >  frequentSequences: data.frame(50 4) > >  sequenceDistribution: data.frame(5 4) > >  perCycle: list(2) > >    baseCall: data.frame(157 4) > >    quality: data.frame(1070 5) > >  perTile: list(2) > >    readCounts: data.frame(0 4) > > medianReadQualityScore: data.frame(0 4) > >  > > But > a few things are not that clear to me. What is readQualityScore, and what are > the quality and density columns in it? I have 1000 reads in my example file, > but there are only 512 rows. I try to generated report as suggested by the > vignette, but failed in the follow ways: > >  > > >> > rpt <- report(qa1)     # Create > report > > Error > in X11(paste("jpeg::", quality, ":", filename, sep = > ""), width, : > >  unable to start device JPEG > > In > addition: Warning message: > > In > jpeg(file.path(imgDir, jpegFile), ...) : > >  no jpeg support in this version of R > > >> > rpt <- report(qa1, type='pdf')     # > Create report > > Error: > UserArgumentMismatch > >  'report, type="pdf"' not > implemented for class 'FastqQA' > >  > > I > am working on a Lunix server remotely, I donâEUR^(TM)t really have X11 access. > Unfortunately, the pdf method is not supported somehow. Is there any way that I > can still generate the report? > > Heyi > > > > > > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENTlink written 7.2 years ago by Valerie Obenchain ♦♦ 6.4k
Hi Heyi, I talked to Martin about this and he had some ideas about the jpeg error. If you ask R about capabilities(), does it say jepeg=FALSE? If it does, there are a couple of options. (1) recompile R w/jpeg support enabled : This would involve making sure the appropriate library was installed. See the R Installation and Administration manual at http://www.r-project.org/ (2) Save your qa object with save(qa, file='somefile.rda'). Move the .rda file back to your local R machine and load it into an R session with load('somefile.rda'). All objects in the file should then be visible and you can run report() on the qa object. Hopefully one of these options will work for you. Valerie On 09/24/10 17:40, Valerie Obenchain wrote: > Hi Heyi, > > The read quality tells us about the quality of each short read in the > general sense of > "sum of quality scores over one short read" / "width of the short > read". The > density is a measurement of the dispersed mass of the distribution of > the input. > The input to the density calculation is the normalized quality score I > mention above. > See ?density for information on the algorithms used and why the length > of the > readQualityScore dataframe is 512. > > What version of R and ShortRead are you using? It is helpful if you > provide this information > by calling sessionInfo() at the end of your code when submitting a question. > > In R-devel (2.12) with ShortRead 1.7.24 this piece of code woks for me. > Please > let me know if you can run it. > > # perform qa on a fastq file provided with ShortRead > exptPath<- system.file("extdata", package = "ShortRead") > sp<- SolexaPath(exptPath) > res<- qa(dirPath=analysisPath(sp), pattern="s_1_sequence.txt", > type="fastq") > > >> res >> > class: FastqQA(9) > QA elements (access with qa[["elt"]]): > readCounts: data.frame(1 3) > baseCalls: data.frame(1 5) > readQualityScore: data.frame(512 4) > baseQuality: data.frame(94 3) > alignQuality: data.frame(1 3) > frequentSequences: data.frame(50 4) > sequenceDistribution: data.frame(3 4) > perCycle: list(2) > baseCall: data.frame(141 4) > quality: data.frame(341 5) > perTile: list(2) > readCounts: data.frame(0 4) > medianReadQualityScore: data.frame(0 4) > > > >> report(res) >> > [1] "/tmp/RtmpCq1M5h/file8edbdab/index.html" > > > Valerie > > > > On 09/24/2010 01:04 PM, heyi xiao wrote: > >> >> >> Dear >> Martin, >> >> I used >> the qa function ShortRead package for the solexa sequencing data quality assessment. >> And I got some result in a FastqQA object below. >> >> >> >>> >>> >> qa1 >> >> class: >> FastqQA(9) >> >> QA >> elements (access with qa[["elt"]]): >> >> ? readCounts: data.frame(1 3) >> >> ? baseCalls: data.frame(1 5) >> >> ? readQualityScore: data.frame(512 4) >> >> ? baseQuality: data.frame(94 3) >> >> ? alignQuality: data.frame(1 3) >> >> ? frequentSequences: data.frame(50 4) >> >> ? sequenceDistribution: data.frame(5 4) >> >> ? perCycle: list(2) >> >> ? ? ? baseCall: data.frame(157 4) >> >> ? ? ? quality: data.frame(1070 5) >> >> ? perTile: list(2) >> >> ? ? ? readCounts: data.frame(0 4) >> >> medianReadQualityScore: data.frame(0 4) >> >> ? >> >> But >> a few things are not that clear to me. What is readQualityScore, and what are >> the quality and density columns in it? I have 1000 reads in my example file, >> but there are only 512 rows. I try to generated report as suggested by the >> vignette, but failed in the follow ways: >> >> ? >> >> >> >>> >>> >> rpt<- report(qa1)? ? ? ? ? # Create >> report >> >> Error >> in X11(paste("jpeg::", quality, ":", filename, sep = >> ""), width,? : >> >> ? unable to start device JPEG >> >> In >> addition: Warning message: >> >> In >> jpeg(file.path(imgDir, jpegFile), ...) : >> >> ? no jpeg support in this version of R >> >> >> >>> >>> >> rpt<- report(qa1, type='pdf')? ? ? ? ? # >> Create report >> >> Error: >> UserArgumentMismatch >> >> ? 'report, type="pdf"' not >> implemented for class 'FastqQA' >> >> ? >> >> I >> am working on a Lunix server remotely, I don?EUR^(TM)t really have X11 access. >> Unfortunately, the pdf method is not supported somehow. Is there any way that I >> can still generate the report? >> >> Heyi >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 7.2 years ago by Valerie Obenchain ♦♦ 6.4k
0
gravatar for heyi xiao
7.2 years ago by
heyi xiao360
United States
heyi xiao360 wrote:
Thanks Valerie, Now I understand the readQualityScore. I ran your example code, the first few steps worked well till the report generation step. I got the same error as in my own example. > report(res) Error in X11(paste("jpeg::", quality, ":", filename, sep = ""), width, :   unable to start device JPEG In addition: Warning message: In jpeg(file.path(imgDir, jpegFile), ...) :   no jpeg support in this version of R It seems to me that I will have to ungrade to R 2.12. If I ungrade, the bioconductor I get will be the devel version using biocLite(), right? Below is my current session info. Thanks again for the help! > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-unknown-linux-gnu locale:  [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US  [4] LC_COLLATE=en_US     LC_MONETARY=C        LC_MESSAGES=en_US  [7] LC_PAPER=en_US       LC_NAME=C            LC_ADDRESS=C [10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] stats     graphics  grDevices utils     datasets  methods   base other attached packages:  [1] CSAMA10_0.0.3          chipseq_0.4.1 GenomicFeatures_1.0.10  [4] biomaRt_2.4.0          BSgenome_1.16.5 EatonEtAlChIPseq_0.0.1  [7] rtracklayer_1.8.1      RCurl_1.4-3            bitops_1.0-4.1 [10] ShortRead_1.6.2        Rsamtools_1.0.8        lattice_0.18-8 [13] Biostrings_2.16.9      GenomicRanges_1.0.9    IRanges_1.6.17 loaded via a namespace (and not attached): [1] Biobase_2.8.0 DBI_0.2-5     grid_2.11.1   hwriter_1.2 RSQLite_0.9-2 [6] tools_2.11.1  XML_3.1-1 [[alternative HTML version deleted]]
ADD COMMENTlink written 7.2 years ago by heyi xiao360
On 09/25/2010 08:22 AM, heyi xiao wrote: > Thanks Valerie, > Now I understand the readQualityScore. I ran your example code, the > first few steps worked well till the report generation step. I got the > same error as in my own example. > > > report(res) > Error in X11(paste("jpeg::", quality, ":", filename, sep = ""), width, : > unable to start device JPEG > In addition: Warning message: > In jpeg(file.path(imgDir, jpegFile), ...) : > no jpeg support in this version of R > > It seems to me that I will have to ungrade to R 2.12. If I ungrade, > the bioconductor I get will be the devel version using biocLite(), right? > > Below is my current session info. Thanks again for the help! > Yes, if you install R 2.12 and biocLite("ShortRead") you will get the most recent devel version of ShortRead. However, your problem appears to be with report generation only and not creating the qa() object so using the devel version will not help us here. I previously thought you were having problems generating a qa() object which is why I provided the example code. Sorry for the misunderstanding. The report generation requires jpeg capabilities and you don't appear to have this on the remote machine that you are running on. I think the quickest fix would be to save the qa() object as an .rda file then move this file to your local R machine. (This .rda file is small even if your data are large.) You can then load() the .rda into an R session on your local machine and generate the report there. It is much more likely that you will have jpeg support on your local machine. I sent an email yesterday (9/25) where I mention another option of recompiling R with jpeg support enabled. Hopefully the .rda file option works for you. Let me know how it goes. Valerie > > > sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US > [4] LC_COLLATE=en_US LC_MONETARY=C LC_MESSAGES=en_US > [7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] CSAMA10_0.0.3 chipseq_0.4.1 GenomicFeatures_1.0.10 > [4] biomaRt_2.4.0 BSgenome_1.16.5 EatonEtAlChIPseq_0.0.1 > [7] rtracklayer_1.8.1 RCurl_1.4-3 bitops_1.0-4.1 > [10] ShortRead_1.6.2 Rsamtools_1.0.8 lattice_0.18-8 > [13] Biostrings_2.16.9 GenomicRanges_1.0.9 IRanges_1.6.17 > > loaded via a namespace (and not attached): > [1] Biobase_2.8.0 DBI_0.2-5 grid_2.11.1 hwriter_1.2 RSQLite_0.9-2 > [6] tools_2.11.1 XML_3.1-1 > > [[alternative HTML version deleted]]
ADD REPLYlink written 7.2 years ago by Valerie Obenchain ♦♦ 6.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 334 users visited in the last hour