Search
Question: a possible bug in the shortread packge
0
gravatar for wang peter
3.5 years ago by
wang peter2.0k
wang peter2.0k wrote:
the coding can works well on many data. but when it works on 12 lines, i met such a problem how can the function tell the score if 33 or 64 system? library(ShortRead); reads <- readFastq(fastqfile); seqs <- sread(reads); score_sys = data.class(quality(reads)); cat("the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n") the output is: the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) is SFastqQuality but it is really the FastqQuality=Phred+33 @HISEQ04:126:C343UACXX:8:1103:15851:74641 1:N:0:ACAGTG GGCCTCTCAATGTCAAGGGATCGACGGCAGATATCATAGATGGCCTCATTGTCCAAGAGAACTGCGACAT CTGTGTGCTCGAGCAAGGAATGAGTGGAAAG + BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFBFFF FBFFFFFFFFFFFFFFFFFFFFFFBFFFBFB @HISEQ04:126:C343UACXX:8:1103:16187:74529 1:N:0:ACAGTG CAATTCTAGCTACTGGAGCTGTCCATTTGCCGCGCAGGCACTGAAGATAGAACATCGATCGAGTCAACCT CTACCTGCATTAGGTGACTGCTGAGAGCTCC + BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFBFFFFFF FFFFFFFBFFFFFFBFFFFFFFFFFFFFFFF @HISEQ04:126:C343UACXX:8:1103:16244:74553 1:N:0:ACAGTG GCCGAAGCATTTTTGGCTTCTGTAAGGTTGTACATATGAAGCAGATTGCTCCAGCTTGGAAGAGTCATGT TTGTGACGAGAGAACTGGCTACAGCTCCAGG + BBBFFFFFFFFFFIIIIIIIIIIIIIIFFIFIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFIIF FFFFFFFFFFFFFFBFFFFFFFFFFFFFFFF -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute for Plant Research Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839@cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253 [[alternative HTML version deleted]]
ADD COMMENTlink modified 3.5 years ago by Martin Morgan ♦♦ 20k • written 3.5 years ago by wang peter2.0k
0
gravatar for Martin Morgan
3.5 years ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:
On 05/14/2014 11:17 AM, Wang Peter wrote: > the coding can works well on many data. > but when it works on 12 lines, i met such a problem > > how can the function tell the score if 33 or 64 system? > > library(ShortRead); > reads <- readFastq(fastqfile); > seqs <- sread(reads); > score_sys = data.class(quality(reads)); > cat("the quality score system > (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n") > > > the output is: > the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) is > SFastqQuality > but it is really the FastqQuality=Phred+33 > > @HISEQ04:126:C343UACXX:8:1103:15851:74641 1:N:0:ACAGTG > GGCCTCTCAATGTCAAGGGATCGACGGCAGATATCATAGATGGCCTCATTGTCCAAGAGAACTGCGAC ATCTGTGTGCTCGAGCAAGGAATGAGTGGAAAG > + > BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFBF FFFBFFFFFFFFFFFFFFFFFFFFFFBFFFBFB > @HISEQ04:126:C343UACXX:8:1103:16187:74529 1:N:0:ACAGTG > CAATTCTAGCTACTGGAGCTGTCCATTTGCCGCGCAGGCACTGAAGATAGAACATCGATCGAGTCAAC CTCTACCTGCATTAGGTGACTGCTGAGAGCTCC > + > BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFBFFFF FFFFFFFFFBFFFFFFBFFFFFFFFFFFFFFFF > @HISEQ04:126:C343UACXX:8:1103:16244:74553 1:N:0:ACAGTG > GCCGAAGCATTTTTGGCTTCTGTAAGGTTGTACATATGAAGCAGATTGCTCCAGCTTGGAAGAGTCAT GTTTGTGACGAGAGAACTGGCTACAGCTCCAGG > + > BBBFFFFFFFFFFIIIIIIIIIIIIIIFFIFIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFI IFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFF > > From the help page ?readFastq the 'qualityType' argument is described as qualityType: Representation to be used for quality scores, must be one of 'Auto' (choose Phred-like if any character is ASCII-encoded as less than 59) 'FastqQuality' (Phred-like encoding), 'SFastqQuality' (Illumina encoding). 'Auto' is the default, none of the ASCII-encoded quality characters is less than 59, hence choose SFastqQuality. Invoke the command with the information about encoding if known, readFastq(fastqfile, qualityType="FastqQuality") See this previous post https://stat.ethz.ch/pipermail/bioconductor/2012-September/048172.html -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENTlink written 3.5 years ago by Martin Morgan ♦♦ 20k
thank you very much i think this method is not reliable if the data is high quality, no nt is low , like 59. they will be thought as SFastqQuality. i would u like to see if some score is higher than 80, then choose SFastqQuality. On Thu, May 15, 2014 at 3:56 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 05/14/2014 11:17 AM, Wang Peter wrote: > >> the coding can works well on many data. >> but when it works on 12 lines, i met such a problem >> >> how can the function tell the score if 33 or 64 system? >> >> library(ShortRead); >> reads <- readFastq(fastqfile); >> seqs <- sread(reads); >> score_sys = data.class(quality(reads)); >> cat("the quality score system >> (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n") >> >> >> the output is: >> the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) >> is >> SFastqQuality >> but it is really the FastqQuality=Phred+33 >> >> @HISEQ04:126:C343UACXX:8:1103:15851:74641 1:N:0:ACAGTG >> GGCCTCTCAATGTCAAGGGATCGACGGCAGATATCATAGATGGCCTCATTGTCCAAGAGA >> ACTGCGACATCTGTGTGCTCGAGCAAGGAATGAGTGGAAAG >> + >> BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFF >> FFFFFFBFFFFBFFFFFFFFFFFFFFFFFFFFFFBFFFBFB >> @HISEQ04:126:C343UACXX:8:1103:16187:74529 1:N:0:ACAGTG >> CAATTCTAGCTACTGGAGCTGTCCATTTGCCGCGCAGGCACTGAAGATAGAACATCGATC >> GAGTCAACCTCTACCTGCATTAGGTGACTGCTGAGAGCTCC >> + >> BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFF >> FFFBFFFFFFFFFFFFFBFFFFFFBFFFFFFFFFFFFFFFF >> @HISEQ04:126:C343UACXX:8:1103:16244:74553 1:N:0:ACAGTG >> GCCGAAGCATTTTTGGCTTCTGTAAGGTTGTACATATGAAGCAGATTGCTCCAGCTTGGA >> AGAGTCATGTTTGTGACGAGAGAACTGGCTACAGCTCCAGG >> + >> BBBFFFFFFFFFFIIIIIIIIIIIIIIFFIFIIIIIIFIIIIIIIIIIIIIIIIIIIIII >> IIIIFFFIIFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFF >> >> >> > From the help page > > ?readFastq > > the 'qualityType' argument is described as > > qualityType: Representation to be used for quality scores, > must be one of 'Auto' (choose Phred-like if any character > is ASCII-encoded as less than 59) 'FastqQuality' > (Phred-like encoding), 'SFastqQuality' (Illumina > encoding). > > 'Auto' is the default, none of the ASCII-encoded quality characters is > less than 59, hence choose SFastqQuality. > > Invoke the command with the information about encoding if known, > > readFastq(fastqfile, qualityType="FastqQuality") > > See this previous post > > https://stat.ethz.ch/pipermail/bioconductor/2012-September/048172.html > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute for Plant Research Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839@cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253 [[alternative HTML version deleted]]
ADD REPLYlink written 3.5 years ago by wang peter2.0k
On 05/14/2014 01:01 PM, Wang Peter wrote: > thank you very much > i think this method is not reliable > if the data is high quality, no nt is low , like 59. > they will be thought as SFastqQuality. > > i would u like to see if some score is higher than 80, then choose SFastqQuality. why 80? maybe better to force explicit choice. Is there a better standard than http://en.wikipedia.org/wiki/FASTQ_format > > > On Thu, May 15, 2014 at 3:56 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> <mailto:mtmorgan at="" fhcrc.org="">> wrote: > > On 05/14/2014 11:17 AM, Wang Peter wrote: > > the coding can works well on many data. > but when it works on 12 lines, i met such a problem > > how can the function tell the score if 33 or 64 system? > > library(ShortRead); > reads <- readFastq(fastqfile); > seqs <- sread(reads); > score_sys = data.class(quality(reads)); > cat("the quality score system > (SFastqQuality=Phred+64,__FastqQuality=Phred+33) is",score_sys,"\n") > > > the output is: > the quality score system (SFastqQuality=Phred+64,__FastqQuality=Phred+33) is > SFastqQuality > but it is really the FastqQuality=Phred+33 > > @HISEQ04:126:C343UACXX:8:1103:__15851:74641 1:N:0:ACAGTG > GGCCTCTCAATGTCAAGGGATCGACGGCAG__ATATCATAGATGGCCTCATTGTCCAAGA GA__ACTGCGACATCTGTGTGCTCGAGCAAGGAA__TGAGTGGAAAG > + > BBBFFFFFFFFFFFFIIIIIIIIIIIIIII__FIIIIIIIIIIIIIIIIIIIIIIIIIIF FF__FFFFFFBFFFFBFFFFFFFFFFFFFFFFFF__FFFFBFFFBFB > @HISEQ04:126:C343UACXX:8:1103:__16187:74529 1:N:0:ACAGTG > CAATTCTAGCTACTGGAGCTGTCCATTTGC__CGCGCAGGCACTGAAGATAGAACATCGA TC__GAGTCAACCTCTACCTGCATTAGGTGACTG__CTGAGAGCTCC > + > BBBFFFFFFFFFFIIIIIIIIIIIIIIIII__IIIIIIIIIIIIIIIIIIIIIIIIIIFF FF__FFFBFFFFFFFFFFFFFBFFFFFFBFFFFF__FFFFFFFFFFF > @HISEQ04:126:C343UACXX:8:1103:__16244:74553 1:N:0:ACAGTG > GCCGAAGCATTTTTGGCTTCTGTAAGGTTG__TACATATGAAGCAGATTGCTCCAGCTTG GA__AGAGTCATGTTTGTGACGAGAGAACTGGCT__ACAGCTCCAGG > + > BBBFFFFFFFFFFIIIIIIIIIIIIIIFFI__FIIIIIIFIIIIIIIIIIIIIIIIIIII II__IIIIFFFIIFFFFFFFFFFFFFFFBFFFFF__FFFFFFFFFFF > > > > From the help page > > ?readFastq > > the 'qualityType' argument is described as > > qualityType: Representation to be used for quality scores, > must be one of 'Auto' (choose Phred-like if any character > is ASCII-encoded as less than 59) 'FastqQuality' > (Phred-like encoding), 'SFastqQuality' (Illumina > encoding). > > 'Auto' is the default, none of the ASCII-encoded quality characters is less > than 59, hence choose SFastqQuality. > > Invoke the command with the information about encoding if known, > > readFastq(fastqfile, qualityType="FastqQuality") > > See this previous post > > https://stat.ethz.ch/__pipermail/bioconductor/2012-__September/0 48172.html > <https: stat.ethz.ch="" pipermail="" bioconductor="" 2012-september="" 0481="" 72.html=""> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 <tel:%28206%29%20667-2793> > > > > > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute for Plant Research > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu <mailto:email%3asg839 at="" cornell.edu=""> > Facebook:http://www.facebook.com/profile.php?id=100001986532253 -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLYlink written 3.5 years ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 255 users visited in the last hour