DEXSeq ignore non-coding regions
1
0
Entering edit mode
igor ▴ 50
@igor
Last seen 18 months ago
United States

Is there a way to ignore non-coding regions for DEXseq analysis? I am getting a lot of UTRs and it would be nice to eliminate those. Is there a way to do that in DEXseq itself or do I need to somehow modify the GTF file or is there some other solution?

dexseq • 1.2k views
ADD COMMENT
1
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 5 months ago
Novartis Institutes for BioMedical Rese…

Hi igor,

This would need to be done by modifying the GTF file...

As a side note, its interesting that you mention this. In most my analyses I have been observing the same: UTR regions are also differentially used quite often compared to coding exons. At the beginning I thought it was a peculiarity of differences in exon usage across tissues (e.g. doi: 10.1073/pnas.1307202110), but I have also noticed the same in, for example, knockdown comparisons.

Alejandro

ADD COMMENT
0
Entering edit mode

How would you modify the GTF to remove non-coding parts? Since the GTF file doesn't differentiate between coding and non-coding exons, is the best option to remove exons and UTRs and rename CDSs to exons? Since that seems like a somewhat questionable approach, is there a better solution?

ADD REPLY
1
Entering edit mode

Probably an easier option would be to change the dexseq_prepare_annotation.py script here:

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=True )
for f in HTSeq.GFF_Reader( gtf_file ):
   if f.type != "exon":
      continue
   f.attr['gene_id'] = f.attr['gene_id'].replace( ":", "_" )
   exons[f.iv] += ( f.attr['gene_id'], f.attr['transcript_id'] )

 

Replace with

   if f.type != "CDS":

This should work, but I have not tested it before!

Alejandro

ADD REPLY

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6