hello,
i want to use the DEXSeq pipeline to get exon counts... and i have followed the vignette. everything works well except that I am interested in counts from only the major exons (i.e. annotated exons).
is there a way to quickly eliminate the non-major exon bins that the associated DEXSeq python scripts generate? i understand there is likely a very good reason for including so many bins, but for my purposes i would like only major exon counts. i have been struggling to find documentation for how to do this.
sincerely,
np
Hello,
I would suggest to subset the original gtf file, to create a new gtf file with a selection of the transcript isoforms that you are interested before running dexseq_prepare_annotation.py. This way, you would eliminate lots of isoforms and avoid unnecessary exon binning.
Alejandro
Hello Alejandro,
This is helpful -- I will read about how to subset the .gtf. One question I had -- is there any way within DESeq to get the coordinates of the exon bins? For our research question, we will eventually have a limited number of gene candidates to focus on and just want to graphically display the counts across exons in individual samples without really performing the full DEU analyses.
It would be nice to have a full DEXSeq count dataset for all genes:exons, and then analyze individual gene candidates for quick interrogations as they come up.
As an example...GAPDH is annotated to have only 9 exons and using the full ENCODE gtf we get 35 exon bins. Is there a way within DEXSeq to determine which of the 35 bins represent the 9 major exons? If we can figure this out, we'll just extract the appropriate exons for each candidate as they come up -- if that makes sense.
np