The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: HTSeq-count on several gff3/gtf files for use in DESeq2
0
gravatar for Jon Bråte
3.0 years ago by
Jon Bråte150
Norway
Jon Bråte150 wrote:

I have several gtf-files and a gff3 file representing different sets of genes. I want to count the expression using HTSeq-count and input them all to DESeq2. But I am not sure what is the best approach. I was thinking that I could simply concatenate all the gtf and gff3 files, but some of the gtf-files have some overlapping gene names (with different isoforms), and the gff3-file will not be identical if I convert i to gtf. And if I count all the files separately, can I then concatenate the files later? What about the last few "special" lines produced by HTSeq if I use the HTSeq-import function in DESeq2?

I might also use Cuffdiff for comparison later, so I guess my question will also apply to Cuffdiff.

htseqtools deseq2 counts cuffdiff • 1.4k views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Jon Bråte150
Answer: HTSeq-count on several gff3/gtf files for use in DESeq2
0
gravatar for Michael Love
3.0 years ago by
Michael Love22k
United States
Michael Love22k wrote:

I'm not sure exactly what's the best approach here. htseq-count will count against the complete set of exons of a gene, so having multiple isoforms with the same gene name in both files is not a problem (even with the same exon listed twice).

So it seems the best approach would be to produce a combined GTF file, although I don't have advice on how to do this. You might try biostars.org for advice.

The special lines are ignored by DESeq2 when reading in from htseq-count.

I'd recommend a separate post for each software. The developers are pinged with an email when you post and tag with the software name, so it's sending out extra emails to busy developers.

ADD COMMENTlink written 3.0 years ago by Michael Love22k
Answer: HTSeq-count on several gff3/gtf files for use in DESeq2
0
gravatar for Jon Bråte
3.0 years ago by
Jon Bråte150
Norway
Jon Bråte150 wrote:

Thanks! I found that combining gtf-files can be a bit tricky. Especially when there is are a mix of gff and gtf files. For the moment I count all the files separately with htseq and import one by one into DESeq2 and create a deseqdataset before merging the count matrices. Then I estimate size factors and dispersions on the combined counts. Probably there is a smoother way though.

ADD COMMENTlink written 3.0 years ago by Jon Bråte150
1

If you want to combine your annotation files, an alternative way will be to read in your gtf/gff files into R and then you will get Data Frame objects for your annotation. You can then combine your data frames and run featureCounts in Rsubread package to get counts. FeatureCounts only needs to have five columns of annotation data including gene id, chr, start, end and stand. Therefore your data frames can be easily merged. Type ?featureCounts for more details.

ADD REPLYlink written 3.0 years ago by Wei Shi3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 232 users visited in the last hour