Question: Error Using DESeq with HTSeq-Count
0
5.6 years ago by
Xiaoyu Liang30
Xiaoyu Liang30 wrote:
Hi all, I was trying to use DESeq with the count table obtained from HTSeq- count. When I used the function "newCountDataSetFromHTSeqCount", it gave me an error complaining some lines of the count table do not have 24 columns. The error message looks like: "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5 did not have 24 elements" I checked the HTSeq-count table results, not only a couple of line don't have 24 columns, most of them don't have. So I can't skip those lines. Is there anything wrong with HTSeq-count results? Would anybody give me any suggestions? Thank you in advance, Veronica [[alternative HTML version deleted]]
deseq • 1.4k views
modified 5.6 years ago by Michael Love24k • written 5.6 years ago by Xiaoyu Liang30
Answer: Error Using DESeq with HTSeq-Count
0
5.6 years ago by
Michael Love24k
United States
Michael Love24k wrote:
Hi Veronica, Could you paste the head of the htseq count table file? Maybe there is some clue as to what is going wrong. Also it's a good idea to include all your code (command line and R) to help package maintainers diagnose what might be going on. Mike On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu@gmail.com> wrote: > Hi all, > > I was trying to use DESeq with the count table obtained from HTSeq- count. > When I used the function "newCountDataSetFromHTSeqCount", it gave me an > error complaining some lines of the count table do not have 24 columns. > > The error message looks like: > > "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, > na.strings, : > line 5 did not have 24 elements" > > I checked the HTSeq-count table results, not only a couple of line don't > have 24 columns, most of them don't have. So I can't skip those lines. > > Is there anything wrong with HTSeq-count results? Would anybody give me any > suggestions? > > Thank you in advance, > Veronica > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
Hi Mike, Thank you for the respond, sorry I didn't include those information. I pasted 3 lines from the HTSeq-count output sam file PLATYPUS_627RLAAXX:4:001:01065:10528 163 chrX 23803314 50 51M = 23803885 622 GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA Tcccc^YacbLTTTa\TTbbbYYcL\c^cYcc_c^cc]b]Y\ccbY^ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1 XF:Z:no_feature PLATYPUS_627RLAAXX:4:001:01065:11031 97 chr12 56552177 50 12M279N28M875N11M = 56553900 1868 GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG cc\\dd^dT^cccaY]YYaTccTab\YL_aYK]LK\_]U]]UY AS:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:51 NM:i:0 XS:A:+ NH:i:1 XF:Z:MYL6 PLATYPUS_627RLAAXX:4:001:01065:11031 145 chr12 56553900 50 33M94N18M = 56552177 -1868 GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG a_\L]TaTTbZU[LZTaRYaTYbYIZSTZ_]UYQS]]\\T[TbT^b AS:i:-6 XM:i:1 XO:i:0 XG:i:0 MD:Z:26A24 NM:i:1 XS:A:+ NH:i:1 XF:Z:MYL6 In the R package, I have a data frame looks like: > table samplename filename condition 1 NML-1 NML-1-htout NML 2 NML-2 NML-2-htout NML 3 LMP-1 LMP-1-htout LMP 4 LMP-2 LMP-2-htout LMP Then I call the following in R cds = newCountDataSetFromHTSeqCount(table, directory=".") Veronica On Mon, Jan 27, 2014 at 9:39 AM, Michael Love <michaelisaiahlove@gmail.com>wrote: > Hi Veronica, > > Could you paste the head of the htseq count table file? Maybe there is > some clue as to what is going wrong. > > Also it's a good idea to include all your code (command line and R) to > help package maintainers diagnose what might be going on. > > Mike > On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu@gmail.com> wrote: > >> Hi all, >> >> I was trying to use DESeq with the count table obtained from HTSeq- count. >> When I used the function "newCountDataSetFromHTSeqCount", it gave me an >> error complaining some lines of the count table do not have 24 columns. >> >> The error message looks like: >> >> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >> na.strings, : >> line 5 did not have 24 elements" >> >> I checked the HTSeq-count table results, not only a couple of line don't >> have 24 columns, most of them don't have. So I can't skip those lines. >> >> Is there anything wrong with HTSeq-count results? Would anybody give me >> any >> suggestions? >> >> Thank you in advance, >> Veronica >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]]
Hi Veronica, The SAM file optionally output by htseq-count is mostly for debugging. You need, instead, to load the counts that are printed to the screen. If your original command was something of the form: samtools view alignments.bam | htseq-count -o alignment.htseq.sam - something.gff then simply do instead: samtools view alignments.bam | htseq-count - something.gff > alignment.counts The alignment.counts file would then be appropriate for loading into R. @Michael et alii, maybe you guys could update the htseq webpage and such to make this more explicit. I've seen a number of people (mostly on seqanswers) have this same misunderstanding. Regards, Devon ____________________________________________ Devon Ryan, Ph.D. Email: dpryan at dpryan.com Tel: +49 (0)178 298-6067 Molecular and Cellular Cognition Lab German Centre for Neurodegenerative Diseases (DZNE) Ludwig-Erhard-Allee 2 53175 Bonn, Germany On Jan 27, 2014, at 4:04 PM, Xiaoyu Liang wrote: > Hi Mike, > > Thank you for the respond, sorry I didn't include those information. > > I pasted 3 lines from the HTSeq-count output sam file > > PLATYPUS_627RLAAXX:4:001:01065:10528 163 chrX 23803314 > 50 51M = 23803885 622 > GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA > Tcccc^YacbLTTTa\TTbbbYYcL\c^cYcc_c^cc]b]Y\ccbY^ AS:i:0 XN:i:0 > XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1 XF:Z:no_feature > PLATYPUS_627RLAAXX:4:001:01065:11031 97 chr12 56552177 > 50 12M279N28M875N11M = 56553900 1868 > GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG > cc\\dd^dT^cccaY]YYaTccTab\YL_aYK]LK\_]U]]UY AS:i:0 XM:i:0 > XO:i:0 XG:i:0 MD:Z:51 NM:i:0 XS:A:+ NH:i:1 XF:Z:MYL6 > PLATYPUS_627RLAAXX:4:001:01065:11031 145 chr12 56553900 > 50 33M94N18M = 56552177 -1868 > GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG > a_\L]TaTTbZU[LZTaRYaTYbYIZSTZ_]UYQS]]\\T[TbT^b AS:i:-6 XM:i:1 > XO:i:0 XG:i:0 MD:Z:26A24 NM:i:1 XS:A:+ NH:i:1 XF:Z:MYL6 > > In the R package, > I have a data frame looks like: >> table > samplename filename condition > 1 NML-1 NML-1-htout NML > 2 NML-2 NML-2-htout NML > 3 LMP-1 LMP-1-htout LMP > 4 LMP-2 LMP-2-htout LMP > > Then I call the following in R > cds = newCountDataSetFromHTSeqCount(table, directory=".") > > > Veronica > > > On Mon, Jan 27, 2014 at 9:39 AM, Michael Love > <michaelisaiahlove at="" gmail.com="">wrote: > >> Hi Veronica, >> >> Could you paste the head of the htseq count table file? Maybe there is >> some clue as to what is going wrong. >> >> Also it's a good idea to include all your code (command line and R) to >> help package maintainers diagnose what might be going on. >> >> Mike >> On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu at="" gmail.com=""> wrote: >> >>> Hi all, >>> >>> I was trying to use DESeq with the count table obtained from HTSeq-count. >>> When I used the function "newCountDataSetFromHTSeqCount", it gave me an >>> error complaining some lines of the count table do not have 24 columns. >>> >>> The error message looks like: >>> >>> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>> na.strings, : >>> line 5 did not have 24 elements" >>> >>> I checked the HTSeq-count table results, not only a couple of line don't >>> have 24 columns, most of them don't have. So I can't skip those lines. >>> >>> Is there anything wrong with HTSeq-count results? Would anybody give me >>> any >>> suggestions? >>> >>> Thank you in advance, >>> Veronica >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Thanks. yes, good point Devon. hi Veronica, As you are just starting, I would also recommend you switch from DESeq to using DESeq2. You can continue to use the htseq count table as produced by Devon's example, or, alternatively, you can produce a count table entirely within R/Bioconductor using the steps outlined in the DESeq2 vignette (which point to the parathyroidSE vignette for an example of using the summarizeOverlaps function). The latter workflow produces a SummarizedExperiment object which contains the count matrix and the range information about the genes. Mike On Mon, Jan 27, 2014 at 10:28 AM, Devon Ryan <dpryan@dpryan.com> wrote: > Hi Veronica, > > The SAM file optionally output by htseq-count is mostly for debugging. You > need, instead, to load the counts that are printed to the screen. > > If your original command was something of the form: > samtools view alignments.bam | htseq-count -o alignment.htseq.sam - > something.gff > > then simply do instead: > samtools view alignments.bam | htseq-count - something.gff > > alignment.counts > > The alignment.counts file would then be appropriate for loading into R. > > @Michael et alii, maybe you guys could update the htseq webpage and such > to make this more explicit. I've seen a number of people (mostly on > seqanswers) have this same misunderstanding. > > Regards, > Devon > > ____________________________________________ > Devon Ryan, Ph.D. > Email: dpryan@dpryan.com > Tel: +49 (0)178 298-6067 > Molecular and Cellular Cognition Lab > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn, Germany > > On Jan 27, 2014, at 4:04 PM, Xiaoyu Liang wrote: > > > Hi Mike, > > > > Thank you for the respond, sorry I didn't include those information. > > > > I pasted 3 lines from the HTSeq-count output sam file > > > > PLATYPUS_627RLAAXX:4:001:01065:10528 163 chrX 23803314 > > 50 51M = 23803885 622 > > GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA > > Tcccc^YacbLTTTa\TTbbbYYcL\c^cYcc_c^cc]b]Y\ccbY^ AS:i:0 XN:i:0 > > XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1 XF:Z:no_feature > > PLATYPUS_627RLAAXX:4:001:01065:11031 97 chr12 56552177 > > 50 12M279N28M875N11M = 56553900 1868 > > GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG > > cc\\dd^dT^cccaY]YYaTccTab\YL_aYK]LK\_]U]]UY AS:i:0 XM:i:0 > > XO:i:0 XG:i:0 MD:Z:51 NM:i:0 XS:A:+ NH:i:1 XF:Z:MYL6 > > PLATYPUS_627RLAAXX:4:001:01065:11031 145 chr12 56553900 > > 50 33M94N18M = 56552177 -1868 > > GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG > > a_\L]TaTTbZU[LZTaRYaTYbYIZSTZ_]UYQS]]\\T[TbT^b AS:i:-6 XM:i:1 > > XO:i:0 XG:i:0 MD:Z:26A24 NM:i:1 XS:A:+ NH:i:1 XF:Z:MYL6 > > > > In the R package, > > I have a data frame looks like: > >> table > > samplename filename condition > > 1 NML-1 NML-1-htout NML > > 2 NML-2 NML-2-htout NML > > 3 LMP-1 LMP-1-htout LMP > > 4 LMP-2 LMP-2-htout LMP > > > > Then I call the following in R > > cds = newCountDataSetFromHTSeqCount(table, directory=".") > > > > > > Veronica > > > > > > On Mon, Jan 27, 2014 at 9:39 AM, Michael Love > > <michaelisaiahlove@gmail.com>wrote: > > > >> Hi Veronica, > >> > >> Could you paste the head of the htseq count table file? Maybe there is > >> some clue as to what is going wrong. > >> > >> Also it's a good idea to include all your code (command line and R) to > >> help package maintainers diagnose what might be going on. > >> > >> Mike > >> On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu@gmail.com> > wrote: > >> > >>> Hi all, > >>> > >>> I was trying to use DESeq with the count table obtained from > HTSeq-count. > >>> When I used the function "newCountDataSetFromHTSeqCount", it gave me an > >>> error complaining some lines of the count table do not have 24 columns. > >>> > >>> The error message looks like: > >>> > >>> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, > >>> na.strings, : > >>> line 5 did not have 24 elements" > >>> > >>> I checked the HTSeq-count table results, not only a couple of line > don't > >>> have 24 columns, most of them don't have. So I can't skip those lines. > >>> > >>> Is there anything wrong with HTSeq-count results? Would anybody give me > >>> any > >>> suggestions? > >>> > >>> Thank you in advance, > >>> Veronica > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
Thank you guys! I got the table loaded correctly. I will use DESeq2. Yeah, I think it would be helpful to update the HTSeq-count manual page. Thanks again, Veronica On Mon, Jan 27, 2014 at 10:34 AM, Michael Love <michaelisaiahlove@gmail.com>wrote: > Thanks. yes, good point Devon. > > hi Veronica, > > As you are just starting, I would also recommend you switch from DESeq to > using DESeq2. You can continue to use the htseq count table as produced by > Devon's example, or, alternatively, you can produce a count table entirely > within R/Bioconductor using the steps outlined in the DESeq2 vignette > (which point to the parathyroidSE vignette for an example of using the > summarizeOverlaps function). The latter workflow produces a > SummarizedExperiment object which contains the count matrix and the range > information about the genes. > > Mike > > > On Mon, Jan 27, 2014 at 10:28 AM, Devon Ryan <dpryan@dpryan.com> wrote: > >> Hi Veronica, >> >> The SAM file optionally output by htseq-count is mostly for debugging. >> You need, instead, to load the counts that are printed to the screen. >> >> If your original command was something of the form: >> samtools view alignments.bam | htseq-count -o alignment.htseq.sam - >> something.gff >> >> then simply do instead: >> samtools view alignments.bam | htseq-count - something.gff > >> alignment.counts >> >> The alignment.counts file would then be appropriate for loading into R. >> >> @Michael et alii, maybe you guys could update the htseq webpage and such >> to make this more explicit. I've seen a number of people (mostly on >> seqanswers) have this same misunderstanding. >> >> Regards, >> Devon >> >> ____________________________________________ >> Devon Ryan, Ph.D. >> Email: dpryan@dpryan.com >> Tel: +49 (0)178 298-6067 >> Molecular and Cellular Cognition Lab >> German Centre for Neurodegenerative Diseases (DZNE) >> Ludwig-Erhard-Allee 2 >> 53175 Bonn, Germany >> >> On Jan 27, 2014, at 4:04 PM, Xiaoyu Liang wrote: >> >> > Hi Mike, >> > >> > Thank you for the respond, sorry I didn't include those information. >> > >> > I pasted 3 lines from the HTSeq-count output sam file >> > >> > PLATYPUS_627RLAAXX:4:001:01065:10528 163 chrX 23803314 >> > 50 51M = 23803885 622 >> > GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA >> > Tcccc^YacbLTTTa\TTbbbYYcL\c^cYcc_c^cc]b]Y\ccbY^ AS:i:0 XN:i:0 >> > XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1 XF:Z:no_feature >> > PLATYPUS_627RLAAXX:4:001:01065:11031 97 chr12 56552177 >> > 50 12M279N28M875N11M = 56553900 1868 >> > GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG >> > cc\\dd^dT^cccaY]YYaTccTab\YL_aYK]LK\_]U]]UY AS:i:0 XM:i:0 >> > XO:i:0 XG:i:0 MD:Z:51 NM:i:0 XS:A:+ NH:i:1 XF:Z:MYL6 >> > PLATYPUS_627RLAAXX:4:001:01065:11031 145 chr12 56553900 >> > 50 33M94N18M = 56552177 -1868 >> > GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG >> > a_\L]TaTTbZU[LZTaRYaTYbYIZSTZ_]UYQS]]\\T[TbT^b AS:i:-6 XM:i:1 >> > XO:i:0 XG:i:0 MD:Z:26A24 NM:i:1 XS:A:+ NH:i:1 XF:Z:MYL6 >> > >> > In the R package, >> > I have a data frame looks like: >> >> table >> > samplename filename condition >> > 1 NML-1 NML-1-htout NML >> > 2 NML-2 NML-2-htout NML >> > 3 LMP-1 LMP-1-htout LMP >> > 4 LMP-2 LMP-2-htout LMP >> > >> > Then I call the following in R >> > cds = newCountDataSetFromHTSeqCount(table, directory=".") >> > >> > >> > Veronica >> > >> > >> > On Mon, Jan 27, 2014 at 9:39 AM, Michael Love >> > <michaelisaiahlove@gmail.com>wrote: >> > >> >> Hi Veronica, >> >> >> >> Could you paste the head of the htseq count table file? Maybe there is >> >> some clue as to what is going wrong. >> >> >> >> Also it's a good idea to include all your code (command line and R) to >> >> help package maintainers diagnose what might be going on. >> >> >> >> Mike >> >> On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu@gmail.com> >> wrote: >> >> >> >>> Hi all, >> >>> >> >>> I was trying to use DESeq with the count table obtained from >> HTSeq-count. >> >>> When I used the function "newCountDataSetFromHTSeqCount", it gave me >> an >> >>> error complaining some lines of the count table do not have 24 >> columns. >> >>> >> >>> The error message looks like: >> >>> >> >>> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >> >>> na.strings, : >> >>> line 5 did not have 24 elements" >> >>> >> >>> I checked the HTSeq-count table results, not only a couple of line >> don't >> >>> have 24 columns, most of them don't have. So I can't skip those lines. >> >>> >> >>> Is there anything wrong with HTSeq-count results? Would anybody give >> me >> >>> any >> >>> suggestions? >> >>> >> >>> Thank you in advance, >> >>> Veronica >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor@r-project.org >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: >> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>> >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > [[alternative HTML version deleted]]