a possible bug in the shortread packge
1
0
Entering edit mode
wang peter ★ 2.0k
@wang-peter-4647
Last seen 9.6 years ago
the coding can works well on many data. but when it works on 12 lines, i met such a problem how can the function tell the score if 33 or 64 system? library(ShortRead); reads <- readFastq(fastqfile); seqs <- sread(reads); score_sys = data.class(quality(reads)); cat("the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n") the output is: the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) is SFastqQuality but it is really the FastqQuality=Phred+33 @HISEQ04:126:C343UACXX:8:1103:15851:74641 1:N:0:ACAGTG GGCCTCTCAATGTCAAGGGATCGACGGCAGATATCATAGATGGCCTCATTGTCCAAGAGAACTGCGACAT CTGTGTGCTCGAGCAAGGAATGAGTGGAAAG + BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFBFFF FBFFFFFFFFFFFFFFFFFFFFFFBFFFBFB @HISEQ04:126:C343UACXX:8:1103:16187:74529 1:N:0:ACAGTG CAATTCTAGCTACTGGAGCTGTCCATTTGCCGCGCAGGCACTGAAGATAGAACATCGATCGAGTCAACCT CTACCTGCATTAGGTGACTGCTGAGAGCTCC + BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFBFFFFFF FFFFFFFBFFFFFFBFFFFFFFFFFFFFFFF @HISEQ04:126:C343UACXX:8:1103:16244:74553 1:N:0:ACAGTG GCCGAAGCATTTTTGGCTTCTGTAAGGTTGTACATATGAAGCAGATTGCTCCAGCTTGGAAGAGTCATGT TTGTGACGAGAGAACTGGCTACAGCTCCAGG + BBBFFFFFFFFFFIIIIIIIIIIIIIIFFIFIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFIIF FFFFFFFFFFFFFFBFFFFFFFFFFFFFFFF -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute for Plant Research Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839@cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253 [[alternative HTML version deleted]]
• 1.0k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 2 days ago
United States
On 05/14/2014 11:17 AM, Wang Peter wrote: > the coding can works well on many data. > but when it works on 12 lines, i met such a problem > > how can the function tell the score if 33 or 64 system? > > library(ShortRead); > reads <- readFastq(fastqfile); > seqs <- sread(reads); > score_sys = data.class(quality(reads)); > cat("the quality score system > (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n") > > > the output is: > the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) is > SFastqQuality > but it is really the FastqQuality=Phred+33 > > @HISEQ04:126:C343UACXX:8:1103:15851:74641 1:N:0:ACAGTG > GGCCTCTCAATGTCAAGGGATCGACGGCAGATATCATAGATGGCCTCATTGTCCAAGAGAACTGCGAC ATCTGTGTGCTCGAGCAAGGAATGAGTGGAAAG > + > BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFBF FFFBFFFFFFFFFFFFFFFFFFFFFFBFFFBFB > @HISEQ04:126:C343UACXX:8:1103:16187:74529 1:N:0:ACAGTG > CAATTCTAGCTACTGGAGCTGTCCATTTGCCGCGCAGGCACTGAAGATAGAACATCGATCGAGTCAAC CTCTACCTGCATTAGGTGACTGCTGAGAGCTCC > + > BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFBFFFF FFFFFFFFFBFFFFFFBFFFFFFFFFFFFFFFF > @HISEQ04:126:C343UACXX:8:1103:16244:74553 1:N:0:ACAGTG > GCCGAAGCATTTTTGGCTTCTGTAAGGTTGTACATATGAAGCAGATTGCTCCAGCTTGGAAGAGTCAT GTTTGTGACGAGAGAACTGGCTACAGCTCCAGG > + > BBBFFFFFFFFFFIIIIIIIIIIIIIIFFIFIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFI IFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFF > > From the help page ?readFastq the 'qualityType' argument is described as qualityType: Representation to be used for quality scores, must be one of 'Auto' (choose Phred-like if any character is ASCII-encoded as less than 59) 'FastqQuality' (Phred-like encoding), 'SFastqQuality' (Illumina encoding). 'Auto' is the default, none of the ASCII-encoded quality characters is less than 59, hence choose SFastqQuality. Invoke the command with the information about encoding if known, readFastq(fastqfile, qualityType="FastqQuality") See this previous post https://stat.ethz.ch/pipermail/bioconductor/2012-September/048172.html -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
thank you very much i think this method is not reliable if the data is high quality, no nt is low , like 59. they will be thought as SFastqQuality. i would u like to see if some score is higher than 80, then choose SFastqQuality. On Thu, May 15, 2014 at 3:56 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 05/14/2014 11:17 AM, Wang Peter wrote: > >> the coding can works well on many data. >> but when it works on 12 lines, i met such a problem >> >> how can the function tell the score if 33 or 64 system? >> >> library(ShortRead); >> reads <- readFastq(fastqfile); >> seqs <- sread(reads); >> score_sys = data.class(quality(reads)); >> cat("the quality score system >> (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n") >> >> >> the output is: >> the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) >> is >> SFastqQuality >> but it is really the FastqQuality=Phred+33 >> >> @HISEQ04:126:C343UACXX:8:1103:15851:74641 1:N:0:ACAGTG >> GGCCTCTCAATGTCAAGGGATCGACGGCAGATATCATAGATGGCCTCATTGTCCAAGAGA >> ACTGCGACATCTGTGTGCTCGAGCAAGGAATGAGTGGAAAG >> + >> BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFF >> FFFFFFBFFFFBFFFFFFFFFFFFFFFFFFFFFFBFFFBFB >> @HISEQ04:126:C343UACXX:8:1103:16187:74529 1:N:0:ACAGTG >> CAATTCTAGCTACTGGAGCTGTCCATTTGCCGCGCAGGCACTGAAGATAGAACATCGATC >> GAGTCAACCTCTACCTGCATTAGGTGACTGCTGAGAGCTCC >> + >> BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFF >> FFFBFFFFFFFFFFFFFBFFFFFFBFFFFFFFFFFFFFFFF >> @HISEQ04:126:C343UACXX:8:1103:16244:74553 1:N:0:ACAGTG >> GCCGAAGCATTTTTGGCTTCTGTAAGGTTGTACATATGAAGCAGATTGCTCCAGCTTGGA >> AGAGTCATGTTTGTGACGAGAGAACTGGCTACAGCTCCAGG >> + >> BBBFFFFFFFFFFIIIIIIIIIIIIIIFFIFIIIIIIFIIIIIIIIIIIIIIIIIIIIII >> IIIIFFFIIFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFF >> >> >> > From the help page > > ?readFastq > > the 'qualityType' argument is described as > > qualityType: Representation to be used for quality scores, > must be one of 'Auto' (choose Phred-like if any character > is ASCII-encoded as less than 59) 'FastqQuality' > (Phred-like encoding), 'SFastqQuality' (Illumina > encoding). > > 'Auto' is the default, none of the ASCII-encoded quality characters is > less than 59, hence choose SFastqQuality. > > Invoke the command with the information about encoding if known, > > readFastq(fastqfile, qualityType="FastqQuality") > > See this previous post > > https://stat.ethz.ch/pipermail/bioconductor/2012-September/048172.html > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute for Plant Research Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839@cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On 05/14/2014 01:01 PM, Wang Peter wrote: > thank you very much > i think this method is not reliable > if the data is high quality, no nt is low , like 59. > they will be thought as SFastqQuality. > > i would u like to see if some score is higher than 80, then choose SFastqQuality. why 80? maybe better to force explicit choice. Is there a better standard than http://en.wikipedia.org/wiki/FASTQ_format > > > On Thu, May 15, 2014 at 3:56 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> <mailto:mtmorgan at="" fhcrc.org="">> wrote: > > On 05/14/2014 11:17 AM, Wang Peter wrote: > > the coding can works well on many data. > but when it works on 12 lines, i met such a problem > > how can the function tell the score if 33 or 64 system? > > library(ShortRead); > reads <- readFastq(fastqfile); > seqs <- sread(reads); > score_sys = data.class(quality(reads)); > cat("the quality score system > (SFastqQuality=Phred+64,__FastqQuality=Phred+33) is",score_sys,"\n") > > > the output is: > the quality score system (SFastqQuality=Phred+64,__FastqQuality=Phred+33) is > SFastqQuality > but it is really the FastqQuality=Phred+33 > > @HISEQ04:126:C343UACXX:8:1103:__15851:74641 1:N:0:ACAGTG > GGCCTCTCAATGTCAAGGGATCGACGGCAG__ATATCATAGATGGCCTCATTGTCCAAGA GA__ACTGCGACATCTGTGTGCTCGAGCAAGGAA__TGAGTGGAAAG > + > BBBFFFFFFFFFFFFIIIIIIIIIIIIIII__FIIIIIIIIIIIIIIIIIIIIIIIIIIF FF__FFFFFFBFFFFBFFFFFFFFFFFFFFFFFF__FFFFBFFFBFB > @HISEQ04:126:C343UACXX:8:1103:__16187:74529 1:N:0:ACAGTG > CAATTCTAGCTACTGGAGCTGTCCATTTGC__CGCGCAGGCACTGAAGATAGAACATCGA TC__GAGTCAACCTCTACCTGCATTAGGTGACTG__CTGAGAGCTCC > + > BBBFFFFFFFFFFIIIIIIIIIIIIIIIII__IIIIIIIIIIIIIIIIIIIIIIIIIIFF FF__FFFBFFFFFFFFFFFFFBFFFFFFBFFFFF__FFFFFFFFFFF > @HISEQ04:126:C343UACXX:8:1103:__16244:74553 1:N:0:ACAGTG > GCCGAAGCATTTTTGGCTTCTGTAAGGTTG__TACATATGAAGCAGATTGCTCCAGCTTG GA__AGAGTCATGTTTGTGACGAGAGAACTGGCT__ACAGCTCCAGG > + > BBBFFFFFFFFFFIIIIIIIIIIIIIIFFI__FIIIIIIFIIIIIIIIIIIIIIIIIIII II__IIIIFFFIIFFFFFFFFFFFFFFFBFFFFF__FFFFFFFFFFF > > > > From the help page > > ?readFastq > > the 'qualityType' argument is described as > > qualityType: Representation to be used for quality scores, > must be one of 'Auto' (choose Phred-like if any character > is ASCII-encoded as less than 59) 'FastqQuality' > (Phred-like encoding), 'SFastqQuality' (Illumina > encoding). > > 'Auto' is the default, none of the ASCII-encoded quality characters is less > than 59, hence choose SFastqQuality. > > Invoke the command with the information about encoding if known, > > readFastq(fastqfile, qualityType="FastqQuality") > > See this previous post > > https://stat.ethz.ch/__pipermail/bioconductor/2012-__September/0 48172.html > <https: stat.ethz.ch="" pipermail="" bioconductor="" 2012-september="" 0481="" 72.html=""> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 <tel:%28206%29%20667-2793> > > > > > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute for Plant Research > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu <mailto:email%3asg839 at="" cornell.edu=""> > Facebook:http://www.facebook.com/profile.php?id=100001986532253 -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6