HTSeq-Count

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.3 years ago

Hi all, I am new to the field of seq and performed a RIP-Seq experiment using HTSeq count as counter. I get now the following (using union, but doesn??t look better for interesection_strict): __no_feature 1503377 __ambiguous 490772 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 5277314 When I sum up counts for all genes, I get 3227845. The number for __no_feature, __ambiguous, __alignment_not_unique look very high. Does somebody have an idea for that? (Additional info: We did random priming and mapped with STAR and masked rRNA loci) Best wishes Julia -- output of sessionInfo(): . -- Sent via the guest posting facility at bioconductor.org.

• 5.0k views

ADD COMMENT • link updated 11.3 years ago by Yuan Hao ▴ 240 • written 11.3 years ago by Guest User ★ 13k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 4 weeks ago

United States

Hi Julia, On Fri, Sep 5, 2014 at 7:41 AM, Julia [guest] <guest at="" bioconductor.org=""> wrote: > Hi all, > I am new to the field of seq and performed a RIP-Seq experiment using HTSeq count as counter. > I get now the following (using union, but doesn??t look better for interesection_strict): > __no_feature 1503377 > __ambiguous 490772 > __too_low_aQual 0 > __not_aligned 0 > __alignment_not_unique 5277314 > > When I sum up counts for all genes, I get 3227845. > > The number for __no_feature, __ambiguous, __alignment_not_unique look very high. > > Does somebody have an idea for that? While I haven't worked with RIP-seq data myself, I do have some experience with HITS-CLIP and PAR-CLIP, which I believe are quite similar (at least in principle) -- these experiments must incredibly difficult to pull off, however, because I'd say most of these types of datasets that came my way were notoriously/incredibly noisy. This is just to say the problem you are seeing may not be an informatics problem, and could (quite possibly) be an experimental one. -steve -- Steve Lianoglou Computational Biologist Genentech

ADD COMMENT • link 11.3 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi Steve, thank you very much for your help. Which tool for DE did you use? I used edger, however I?ve read that edgeR and DESeq2 might be overstringent for RIP-Seq (f.e. RIP-Seeker package Paper, Supplement). Best wishes, Julia -----Urspr?ngliche Nachricht----- Von: mailinglist.honeypot at gmail.com [mailto:mailinglist.honeypot at gmail.com] Im Auftrag von Steve Lianoglou Gesendet: Freitag, 5. September 2014 18:35 An: Julia [guest] Cc: bioconductor at r-project.org list; Pickl, Julia Betreff: Re: [BioC] HTSeq-Count Hi Julia, On Fri, Sep 5, 2014 at 7:41 AM, Julia [guest] <guest at="" bioconductor.org=""> wrote: > Hi all, > I am new to the field of seq and performed a RIP-Seq experiment using HTSeq count as counter. > I get now the following (using union, but doesn??t look better for interesection_strict): > __no_feature 1503377 > __ambiguous 490772 > __too_low_aQual 0 > __not_aligned 0 > __alignment_not_unique 5277314 > > When I sum up counts for all genes, I get 3227845. > > The number for __no_feature, __ambiguous, __alignment_not_unique look very high. > > Does somebody have an idea for that? While I haven't worked with RIP-seq data myself, I do have some experience with HITS-CLIP and PAR-CLIP, which I believe are quite similar (at least in principle) -- these experiments must incredibly difficult to pull off, however, because I'd say most of these types of datasets that came my way were notoriously/incredibly noisy. This is just to say the problem you are seeing may not be an informatics problem, and could (quite possibly) be an experimental one. -steve -- Steve Lianoglou Computational Biologist Genentech

ADD REPLY • link 11.3 years ago Pickl, Julia ▴ 60

0

Entering edit mode

Hi Julia, On Mon, Sep 8, 2014 at 10:10 AM, Pickl, Julia <j.pickl at="" dkfz-heidelberg.de=""> wrote: > Hi Steve, > > thank you very much for your help. > > Which tool for DE did you use? > I used edger, however I?ve read that edgeR and DESeq2 might be overstringent for RIP-Seq (f.e. RIP-Seeker package Paper, Supplement). At the time, we used DESeq. I was comparing differential binding of an RBP to targets between conditions, though. It seems like you want to look at one condition and ask what transcripts are being bound by a RBP, though. In my opinion, the differential binding question is more interesting and biologically relevant -- and likely to give you "real" signal. Again it's my opinion, but doing one experiment and asking where your RBP binds in that experiment in isolation (even though you have done a control run in the same cell type/condition) likely won't mean all that much. -steve -- Steve Lianoglou Computational Biologist Genentech

ADD REPLY • link 11.3 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Yuan Hao ▴ 240

@yuan-hao-3658

Last seen 11.2 years ago

United States

Hi Julia, You obviously allowed multiple mappings when calling STAR, however, HTSeq counts only uniquely mapped reads. ~1.5M reads mapped to intergenic and/or intronic regions which contributed to ?_no_feature?. ?_ambiguous? mapped to regions annotated by multiple genes. If you want to take into account multiple mapped reads, you can either evenly distribute them to all gene targets (normalized by times of mapping), distribute them according to uniquely mapped reads, or distribute them more sophistically by doing a multiple-run EM algorithm (such as the RSEM does). Cheers, Yuan On Sep 5, 2014, at 10:41 AM, Julia [guest] <guest at="" bioconductor.org=""> wrote: > Hi all, > I am new to the field of seq and performed a RIP-Seq experiment using HTSeq count as counter. > I get now the following (using union, but doesn??t look better for interesection_strict): > __no_feature 1503377 > __ambiguous 490772 > __too_low_aQual 0 > __not_aligned 0 > __alignment_not_unique 5277314 > > When I sum up counts for all genes, I get 3227845. > > The number for __no_feature, __ambiguous, __alignment_not_unique look very high. > > Does somebody have an idea for that? > > (Additional info: We did random priming and mapped with STAR and masked rRNA loci) > > Best wishes > Julia > > -- output of sessionInfo(): > > . > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 11.3 years ago Yuan Hao ▴ 240

0

Entering edit mode

Hi Yuan, thank you very much for your reply. Would you say that I should take into account multiple mapped reads as these are more than unique reads or do you think the high number of counts for _alignment_not_unique are not a problem per se? I am a beginner to this field, so I would be happy if you could share your experience with me. Best wishes Julia Von: Yuan Hao [mailto:yuan.x.hao at gmail.com] Gesendet: Freitag, 5. September 2014 17:18 An: Julia [guest] Cc: bioconductor at r-project.org; Pickl, Julia Betreff: Re: [BioC] HTSeq-Count Hi Julia, You obviously allowed multiple mappings when calling STAR, however, HTSeq counts only uniquely mapped reads. ~1.5M reads mapped to intergenic and/or intronic regions which contributed to "_no_feature". "_ambiguous" mapped to regions annotated by multiple genes. If you want to take into account multiple mapped reads, you can either evenly distribute them to all gene targets (normalized by times of mapping), distribute them according to uniquely mapped reads, or distribute them more sophistically by doing a multiple-run EM algorithm (such as the RSEM does). Cheers, Yuan On Sep 5, 2014, at 10:41 AM, Julia [guest] <guest at="" bioconductor.org<mailto:guest="" at="" bioconductor.org="">> wrote: Hi all, I am new to the field of seq and performed a RIP-Seq experiment using HTSeq count as counter. I get now the following (using union, but doesn?t look better for interesection_strict): __no_feature 1503377 __ambiguous 490772 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 5277314 When I sum up counts for all genes, I get 3227845. The number for __no_feature, __ambiguous, __alignment_not_unique look very high. Does somebody have an idea for that? (Additional info: We did random priming and mapped with STAR and masked rRNA loci) Best wishes Julia -- output of sessionInfo(): . -- Sent via the guest posting facility at bioconductor.org<http: bioconductor.org="">. _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 11.3 years ago Pickl, Julia ▴ 60

0

Entering edit mode

Yuan Hao ▴ 240

@yuan-hao-3658

Last seen 11.2 years ago

United States

Hi Julia, Depending on questions in hand, there are good reasons to consider only uniquely mappable reads or being serious about multiple mappings (such as studying TEs or facing repetitive genomes). In terms of traditional RNASeq data, however, most time I ran into situation that multiple reads contributed to a substential part and personally I believe including the multiple reads should benefit the accurate expression estimation. Cheers, Yuan On Sep 5, 2014, at 11:48 AM, Pickl, Julia <j.pickl at="" dkfz-="" heidelberg.de=""> wrote: > Hi Yuan, > > thank you very much for your reply. > > Would you say that I should take into account multiple mapped reads as these are more than unique reads or do you think the high number of counts for _alignment_not_unique are not a problem per se? > > I am a beginner to this field, so I would be happy if you could share your experience with me. > > Best wishes > Julia > > Von: Yuan Hao [mailto:yuan.x.hao at gmail.com] > Gesendet: Freitag, 5. September 2014 17:18 > An: Julia [guest] > Cc: bioconductor at r-project.org; Pickl, Julia > Betreff: Re: [BioC] HTSeq-Count > > Hi Julia, > > You obviously allowed multiple mappings when calling STAR, however, HTSeq counts only uniquely mapped reads. ~1.5M reads mapped to intergenic and/or intronic regions which contributed to ?_no_feature?. ?_ambiguous? mapped to regions annotated by multiple genes. > > If you want to take into account multiple mapped reads, you can either evenly distribute them to all gene targets (normalized by times of mapping), distribute them according to uniquely mapped reads, or distribute them more sophistically by doing a multiple-run EM algorithm (such as the RSEM does). > > Cheers, > Yuan > On Sep 5, 2014, at 10:41 AM, Julia [guest] <guest at="" bioconductor.org=""> wrote: > > > Hi all, > I am new to the field of seq and performed a RIP-Seq experiment using HTSeq count as counter. > I get now the following (using union, but doesn??t look better for interesection_strict): > __no_feature 1503377 > __ambiguous 490772 > __too_low_aQual 0 > __not_aligned 0 > __alignment_not_unique 5277314 > > When I sum up counts for all genes, I get 3227845. > > The number for __no_feature, __ambiguous, __alignment_not_unique look very high. > > Does somebody have an idea for that? > > (Additional info: We did random priming and mapped with STAR and masked rRNA loci) > > Best wishes > Julia > > -- output of sessionInfo(): > > . > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 11.3 years ago Yuan Hao ▴ 240

0

Entering edit mode

Hi Yuan, thank you very much for sharing your experience with me! Best wishes Julia Von: Yuan Hao [mailto:yuan.x.hao at gmail.com] Gesendet: Freitag, 5. September 2014 19:45 An: Pickl, Julia Cc: Yuan Hao; Julia [guest]; bioconductor at r-project.org Betreff: Re: [BioC] HTSeq-Count Hi Julia, Depending on questions in hand, there are good reasons to consider only uniquely mappable reads or being serious about multiple mappings (such as studying TEs or facing repetitive genomes). In terms of traditional RNASeq data, however, most time I ran into situation that multiple reads contributed to a substential part and personally I believe including the multiple reads should benefit the accurate expression estimation. Cheers, Yuan On Sep 5, 2014, at 11:48 AM, Pickl, Julia <j.pickl at="" dkfz-="" heidelberg.de<mailto:j.pickl="" at="" dkfz-heidelberg.de="">> wrote: Hi Yuan, thank you very much for your reply. Would you say that I should take into account multiple mapped reads as these are more than unique reads or do you think the high number of counts for _alignment_not_unique are not a problem per se? I am a beginner to this field, so I would be happy if you could share your experience with me. Best wishes Julia Von: Yuan Hao [mailto:yuan.x.hao at gmail.com] Gesendet: Freitag, 5. September 2014 17:18 An: Julia [guest] Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">; Pickl, Julia Betreff: Re: [BioC] HTSeq-Count Hi Julia, You obviously allowed multiple mappings when calling STAR, however, HTSeq counts only uniquely mapped reads. ~1.5M reads mapped to intergenic and/or intronic regions which contributed to "_no_feature". "_ambiguous" mapped to regions annotated by multiple genes. If you want to take into account multiple mapped reads, you can either evenly distribute them to all gene targets (normalized by times of mapping), distribute them according to uniquely mapped reads, or distribute them more sophistically by doing a multiple-run EM algorithm (such as the RSEM does). Cheers, Yuan On Sep 5, 2014, at 10:41 AM, Julia [guest] <guest at="" bioconductor.org<mailto:guest="" at="" bioconductor.org="">> wrote: Hi all, I am new to the field of seq and performed a RIP-Seq experiment using HTSeq count as counter. I get now the following (using union, but doesn?t look better for interesection_strict): __no_feature 1503377 __ambiguous 490772 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 5277314 When I sum up counts for all genes, I get 3227845. The number for __no_feature, __ambiguous, __alignment_not_unique look very high. Does somebody have an idea for that? (Additional info: We did random priming and mapped with STAR and masked rRNA loci) Best wishes Julia -- output of sessionInfo(): . -- Sent via the guest posting facility at bioconductor.org<http: bioconductor.org=""/>. _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 11.3 years ago Pickl, Julia ▴ 60

Login before adding your answer.