Error Using DESeq with HTSeq-Count
1
0
Entering edit mode
Xiaoyu Liang ▴ 30
@xiaoyu-liang-6358
Last seen 9.6 years ago
Hi all, I was trying to use DESeq with the count table obtained from HTSeq- count. When I used the function "newCountDataSetFromHTSeqCount", it gave me an error complaining some lines of the count table do not have 24 columns. The error message looks like: "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5 did not have 24 elements" I checked the HTSeq-count table results, not only a couple of line don't have 24 columns, most of them don't have. So I can't skip those lines. Is there anything wrong with HTSeq-count results? Would anybody give me any suggestions? Thank you in advance, Veronica [[alternative HTML version deleted]]
DESeq DESeq • 2.4k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 12 hours ago
United States
Hi Veronica, Could you paste the head of the htseq count table file? Maybe there is some clue as to what is going wrong. Also it's a good idea to include all your code (command line and R) to help package maintainers diagnose what might be going on. Mike On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu@gmail.com> wrote: > Hi all, > > I was trying to use DESeq with the count table obtained from HTSeq- count. > When I used the function "newCountDataSetFromHTSeqCount", it gave me an > error complaining some lines of the count table do not have 24 columns. > > The error message looks like: > > "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, > na.strings, : > line 5 did not have 24 elements" > > I checked the HTSeq-count table results, not only a couple of line don't > have 24 columns, most of them don't have. So I can't skip those lines. > > Is there anything wrong with HTSeq-count results? Would anybody give me any > suggestions? > > Thank you in advance, > Veronica > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Mike, Thank you for the respond, sorry I didn't include those information. I pasted 3 lines from the HTSeq-count output sam file PLATYPUS_627RLAAXX:4:001:01065:10528 163 chrX 23803314 50 51M = 23803885 622 GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA T`cccc^YacbL`TTTa\TTbbbYYcL\c`^cYcc_c^`cc]b]Y\ccbY^ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1 XF:Z:no_feature PLATYPUS_627RLAAXX:4:001:01065:11031 97 chr12 56552177 50 12M279N28M875N11M = 56553900 1868 GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG cc`\\dd^dT^ccca`Y`]YYa```TccTab\YL_a`YK`]LK\_]U]]UY AS:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:51 NM:i:0 XS:A:+ NH:i:1 XF:Z:MYL6 PLATYPUS_627RLAAXX:4:001:01065:11031 145 chr12 56553900 50 33M94N18M = 56552177 -1868 GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG a_\L`]TaTTbZU[LZTaRYaTYb`YIZSTZ_]UYQS]]\\`T[``TbT^b AS:i:-6 XM:i:1 XO:i:0 XG:i:0 MD:Z:26A24 NM:i:1 XS:A:+ NH:i:1 XF:Z:MYL6 In the R package, I have a data frame looks like: > table samplename filename condition 1 NML-1 NML-1-htout NML 2 NML-2 NML-2-htout NML 3 LMP-1 LMP-1-htout LMP 4 LMP-2 LMP-2-htout LMP Then I call the following in R cds = newCountDataSetFromHTSeqCount(table, directory=".") Veronica On Mon, Jan 27, 2014 at 9:39 AM, Michael Love <michaelisaiahlove@gmail.com>wrote: > Hi Veronica, > > Could you paste the head of the htseq count table file? Maybe there is > some clue as to what is going wrong. > > Also it's a good idea to include all your code (command line and R) to > help package maintainers diagnose what might be going on. > > Mike > On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu@gmail.com> wrote: > >> Hi all, >> >> I was trying to use DESeq with the count table obtained from HTSeq- count. >> When I used the function "newCountDataSetFromHTSeqCount", it gave me an >> error complaining some lines of the count table do not have 24 columns. >> >> The error message looks like: >> >> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >> na.strings, : >> line 5 did not have 24 elements" >> >> I checked the HTSeq-count table results, not only a couple of line don't >> have 24 columns, most of them don't have. So I can't skip those lines. >> >> Is there anything wrong with HTSeq-count results? Would anybody give me >> any >> suggestions? >> >> Thank you in advance, >> Veronica >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Veronica, The SAM file optionally output by htseq-count is mostly for debugging. You need, instead, to load the counts that are printed to the screen. If your original command was something of the form: samtools view alignments.bam | htseq-count -o alignment.htseq.sam - something.gff then simply do instead: samtools view alignments.bam | htseq-count - something.gff > alignment.counts The alignment.counts file would then be appropriate for loading into R. @Michael et alii, maybe you guys could update the htseq webpage and such to make this more explicit. I've seen a number of people (mostly on seqanswers) have this same misunderstanding. Regards, Devon ____________________________________________ Devon Ryan, Ph.D. Email: dpryan at dpryan.com Tel: +49 (0)178 298-6067 Molecular and Cellular Cognition Lab German Centre for Neurodegenerative Diseases (DZNE) Ludwig-Erhard-Allee 2 53175 Bonn, Germany On Jan 27, 2014, at 4:04 PM, Xiaoyu Liang wrote: > Hi Mike, > > Thank you for the respond, sorry I didn't include those information. > > I pasted 3 lines from the HTSeq-count output sam file > > PLATYPUS_627RLAAXX:4:001:01065:10528 163 chrX 23803314 > 50 51M = 23803885 622 > GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA > T`cccc^YacbL`TTTa\TTbbbYYcL\c`^cYcc_c^`cc]b]Y\ccbY^ AS:i:0 XN:i:0 > XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1 XF:Z:no_feature > PLATYPUS_627RLAAXX:4:001:01065:11031 97 chr12 56552177 > 50 12M279N28M875N11M = 56553900 1868 > GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG > cc`\\dd^dT^ccca`Y`]YYa```TccTab\YL_a`YK`]LK\_]U]]UY AS:i:0 XM:i:0 > XO:i:0 XG:i:0 MD:Z:51 NM:i:0 XS:A:+ NH:i:1 XF:Z:MYL6 > PLATYPUS_627RLAAXX:4:001:01065:11031 145 chr12 56553900 > 50 33M94N18M = 56552177 -1868 > GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG > a_\L`]TaTTbZU[LZTaRYaTYb`YIZSTZ_]UYQS]]\\`T[``TbT^b AS:i:-6 XM:i:1 > XO:i:0 XG:i:0 MD:Z:26A24 NM:i:1 XS:A:+ NH:i:1 XF:Z:MYL6 > > In the R package, > I have a data frame looks like: >> table > samplename filename condition > 1 NML-1 NML-1-htout NML > 2 NML-2 NML-2-htout NML > 3 LMP-1 LMP-1-htout LMP > 4 LMP-2 LMP-2-htout LMP > > Then I call the following in R > cds = newCountDataSetFromHTSeqCount(table, directory=".") > > > Veronica > > > On Mon, Jan 27, 2014 at 9:39 AM, Michael Love > <michaelisaiahlove at="" gmail.com="">wrote: > >> Hi Veronica, >> >> Could you paste the head of the htseq count table file? Maybe there is >> some clue as to what is going wrong. >> >> Also it's a good idea to include all your code (command line and R) to >> help package maintainers diagnose what might be going on. >> >> Mike >> On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu at="" gmail.com=""> wrote: >> >>> Hi all, >>> >>> I was trying to use DESeq with the count table obtained from HTSeq-count. >>> When I used the function "newCountDataSetFromHTSeqCount", it gave me an >>> error complaining some lines of the count table do not have 24 columns. >>> >>> The error message looks like: >>> >>> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>> na.strings, : >>> line 5 did not have 24 elements" >>> >>> I checked the HTSeq-count table results, not only a couple of line don't >>> have 24 columns, most of them don't have. So I can't skip those lines. >>> >>> Is there anything wrong with HTSeq-count results? Would anybody give me >>> any >>> suggestions? >>> >>> Thank you in advance, >>> Veronica >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Thanks. yes, good point Devon. hi Veronica, As you are just starting, I would also recommend you switch from DESeq to using DESeq2. You can continue to use the htseq count table as produced by Devon's example, or, alternatively, you can produce a count table entirely within R/Bioconductor using the steps outlined in the DESeq2 vignette (which point to the parathyroidSE vignette for an example of using the summarizeOverlaps function). The latter workflow produces a SummarizedExperiment object which contains the count matrix and the range information about the genes. Mike On Mon, Jan 27, 2014 at 10:28 AM, Devon Ryan <dpryan@dpryan.com> wrote: > Hi Veronica, > > The SAM file optionally output by htseq-count is mostly for debugging. You > need, instead, to load the counts that are printed to the screen. > > If your original command was something of the form: > samtools view alignments.bam | htseq-count -o alignment.htseq.sam - > something.gff > > then simply do instead: > samtools view alignments.bam | htseq-count - something.gff > > alignment.counts > > The alignment.counts file would then be appropriate for loading into R. > > @Michael et alii, maybe you guys could update the htseq webpage and such > to make this more explicit. I've seen a number of people (mostly on > seqanswers) have this same misunderstanding. > > Regards, > Devon > > ____________________________________________ > Devon Ryan, Ph.D. > Email: dpryan@dpryan.com > Tel: +49 (0)178 298-6067 > Molecular and Cellular Cognition Lab > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn, Germany > > On Jan 27, 2014, at 4:04 PM, Xiaoyu Liang wrote: > > > Hi Mike, > > > > Thank you for the respond, sorry I didn't include those information. > > > > I pasted 3 lines from the HTSeq-count output sam file > > > > PLATYPUS_627RLAAXX:4:001:01065:10528 163 chrX 23803314 > > 50 51M = 23803885 622 > > GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA > > T`cccc^YacbL`TTTa\TTbbbYYcL\c`^cYcc_c^`cc]b]Y\ccbY^ AS:i:0 XN:i:0 > > XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1 XF:Z:no_feature > > PLATYPUS_627RLAAXX:4:001:01065:11031 97 chr12 56552177 > > 50 12M279N28M875N11M = 56553900 1868 > > GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG > > cc`\\dd^dT^ccca`Y`]YYa```TccTab\YL_a`YK`]LK\_]U]]UY AS:i:0 XM:i:0 > > XO:i:0 XG:i:0 MD:Z:51 NM:i:0 XS:A:+ NH:i:1 XF:Z:MYL6 > > PLATYPUS_627RLAAXX:4:001:01065:11031 145 chr12 56553900 > > 50 33M94N18M = 56552177 -1868 > > GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG > > a_\L`]TaTTbZU[LZTaRYaTYb`YIZSTZ_]UYQS]]\\`T[``TbT^b AS:i:-6 XM:i:1 > > XO:i:0 XG:i:0 MD:Z:26A24 NM:i:1 XS:A:+ NH:i:1 XF:Z:MYL6 > > > > In the R package, > > I have a data frame looks like: > >> table > > samplename filename condition > > 1 NML-1 NML-1-htout NML > > 2 NML-2 NML-2-htout NML > > 3 LMP-1 LMP-1-htout LMP > > 4 LMP-2 LMP-2-htout LMP > > > > Then I call the following in R > > cds = newCountDataSetFromHTSeqCount(table, directory=".") > > > > > > Veronica > > > > > > On Mon, Jan 27, 2014 at 9:39 AM, Michael Love > > <michaelisaiahlove@gmail.com>wrote: > > > >> Hi Veronica, > >> > >> Could you paste the head of the htseq count table file? Maybe there is > >> some clue as to what is going wrong. > >> > >> Also it's a good idea to include all your code (command line and R) to > >> help package maintainers diagnose what might be going on. > >> > >> Mike > >> On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu@gmail.com> > wrote: > >> > >>> Hi all, > >>> > >>> I was trying to use DESeq with the count table obtained from > HTSeq-count. > >>> When I used the function "newCountDataSetFromHTSeqCount", it gave me an > >>> error complaining some lines of the count table do not have 24 columns. > >>> > >>> The error message looks like: > >>> > >>> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, > >>> na.strings, : > >>> line 5 did not have 24 elements" > >>> > >>> I checked the HTSeq-count table results, not only a couple of line > don't > >>> have 24 columns, most of them don't have. So I can't skip those lines. > >>> > >>> Is there anything wrong with HTSeq-count results? Would anybody give me > >>> any > >>> suggestions? > >>> > >>> Thank you in advance, > >>> Veronica > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thank you guys! I got the table loaded correctly. I will use DESeq2. Yeah, I think it would be helpful to update the HTSeq-count manual page. Thanks again, Veronica On Mon, Jan 27, 2014 at 10:34 AM, Michael Love <michaelisaiahlove@gmail.com>wrote: > Thanks. yes, good point Devon. > > hi Veronica, > > As you are just starting, I would also recommend you switch from DESeq to > using DESeq2. You can continue to use the htseq count table as produced by > Devon's example, or, alternatively, you can produce a count table entirely > within R/Bioconductor using the steps outlined in the DESeq2 vignette > (which point to the parathyroidSE vignette for an example of using the > summarizeOverlaps function). The latter workflow produces a > SummarizedExperiment object which contains the count matrix and the range > information about the genes. > > Mike > > > On Mon, Jan 27, 2014 at 10:28 AM, Devon Ryan <dpryan@dpryan.com> wrote: > >> Hi Veronica, >> >> The SAM file optionally output by htseq-count is mostly for debugging. >> You need, instead, to load the counts that are printed to the screen. >> >> If your original command was something of the form: >> samtools view alignments.bam | htseq-count -o alignment.htseq.sam - >> something.gff >> >> then simply do instead: >> samtools view alignments.bam | htseq-count - something.gff > >> alignment.counts >> >> The alignment.counts file would then be appropriate for loading into R. >> >> @Michael et alii, maybe you guys could update the htseq webpage and such >> to make this more explicit. I've seen a number of people (mostly on >> seqanswers) have this same misunderstanding. >> >> Regards, >> Devon >> >> ____________________________________________ >> Devon Ryan, Ph.D. >> Email: dpryan@dpryan.com >> Tel: +49 (0)178 298-6067 >> Molecular and Cellular Cognition Lab >> German Centre for Neurodegenerative Diseases (DZNE) >> Ludwig-Erhard-Allee 2 >> 53175 Bonn, Germany >> >> On Jan 27, 2014, at 4:04 PM, Xiaoyu Liang wrote: >> >> > Hi Mike, >> > >> > Thank you for the respond, sorry I didn't include those information. >> > >> > I pasted 3 lines from the HTSeq-count output sam file >> > >> > PLATYPUS_627RLAAXX:4:001:01065:10528 163 chrX 23803314 >> > 50 51M = 23803885 622 >> > GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA >> > T`cccc^YacbL`TTTa\TTbbbYYcL\c`^cYcc_c^`cc]b]Y\ccbY^ AS:i:0 XN:i:0 >> > XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1 XF:Z:no_feature >> > PLATYPUS_627RLAAXX:4:001:01065:11031 97 chr12 56552177 >> > 50 12M279N28M875N11M = 56553900 1868 >> > GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG >> > cc`\\dd^dT^ccca`Y`]YYa```TccTab\YL_a`YK`]LK\_]U]]UY AS:i:0 XM:i:0 >> > XO:i:0 XG:i:0 MD:Z:51 NM:i:0 XS:A:+ NH:i:1 XF:Z:MYL6 >> > PLATYPUS_627RLAAXX:4:001:01065:11031 145 chr12 56553900 >> > 50 33M94N18M = 56552177 -1868 >> > GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG >> > a_\L`]TaTTbZU[LZTaRYaTYb`YIZSTZ_]UYQS]]\\`T[``TbT^b AS:i:-6 XM:i:1 >> > XO:i:0 XG:i:0 MD:Z:26A24 NM:i:1 XS:A:+ NH:i:1 XF:Z:MYL6 >> > >> > In the R package, >> > I have a data frame looks like: >> >> table >> > samplename filename condition >> > 1 NML-1 NML-1-htout NML >> > 2 NML-2 NML-2-htout NML >> > 3 LMP-1 LMP-1-htout LMP >> > 4 LMP-2 LMP-2-htout LMP >> > >> > Then I call the following in R >> > cds = newCountDataSetFromHTSeqCount(table, directory=".") >> > >> > >> > Veronica >> > >> > >> > On Mon, Jan 27, 2014 at 9:39 AM, Michael Love >> > <michaelisaiahlove@gmail.com>wrote: >> > >> >> Hi Veronica, >> >> >> >> Could you paste the head of the htseq count table file? Maybe there is >> >> some clue as to what is going wrong. >> >> >> >> Also it's a good idea to include all your code (command line and R) to >> >> help package maintainers diagnose what might be going on. >> >> >> >> Mike >> >> On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu@gmail.com> >> wrote: >> >> >> >>> Hi all, >> >>> >> >>> I was trying to use DESeq with the count table obtained from >> HTSeq-count. >> >>> When I used the function "newCountDataSetFromHTSeqCount", it gave me >> an >> >>> error complaining some lines of the count table do not have 24 >> columns. >> >>> >> >>> The error message looks like: >> >>> >> >>> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >> >>> na.strings, : >> >>> line 5 did not have 24 elements" >> >>> >> >>> I checked the HTSeq-count table results, not only a couple of line >> don't >> >>> have 24 columns, most of them don't have. So I can't skip those lines. >> >>> >> >>> Is there anything wrong with HTSeq-count results? Would anybody give >> me >> >>> any >> >>> suggestions? >> >>> >> >>> Thank you in advance, >> >>> Veronica >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor@r-project.org >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: >> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>> >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6