Question

DEXSeq on introns but exon overlap and error message

0

Entering edit mode

mat.lesche ▴ 110

@matlesche-6835

Last seen 13 months ago

Germany

Hello,

I want to use DEXSeq on introns and from what I read from some threads, it is possible to do and recommended. However I'm having a few problems and questions.

DEXSeq is dividing exons into exon-bins with the script dexseq_prepare_annotation.py. If two exons, from different transcripts but the same gene, overlap each other, then it creates unique bins. Let's apply this to introns. So I create my set of introns out of a Ensembl gtf file with a script. Many of these introns (e.g. intron 1 from isoform A of gene Z) will overlap with exons (e.g. exon 1 from isoform B of Gene Z) and after creating the introns-bins and counting, bins which overlap with exon will have more reads compared to introns which are pure introns (no overlap with any exon).

1. I'd like to now if this will have influence on the calculation for the usage?

So I was thinking about a more stringent approach and only using intronic parts which don't overlap with exons.

2. Is this a good approach?

I know you will loose the information about intron-isoform relation but at the moment I'm just interested to see if there is a difference in the intron usage between my condition.

So I have created a file which only has unique introns and ran into a problem while using dexseq_prepare_annotation.py. Ensembl75 has a gene http://feb2014.archive.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000092329;r=8:119910841-124345722;t=ENSMUST00000127664 which has 16 ( = 15 introns) and the distance between the 1st and 2nd exon is over 4 Mb. Now if I run the script on the gene and it's 15 introns, then I get 362 intron-bins and 738 intron-bins with the aggregate option set to True. From what I could see the problem seems to be in Step 3 of the script. Somehow the large first intron is split into the intron parts of other genes within the large intron. Right now, I would just remove the gene from the gff and do a recount on the data.

Now I ran DEXSeq and this is my sampleTable:

sSampleTable <- data.frame(row.names=c("S1", "S2", "S3", "S4", "S5", "S6"), Type=c("Het", "Het", "Het", "Ko", "Ko", "Ko"))

and this is my design:

~ sample + exon + Type:exon

I get the following error:

sdxd <- estimateExonFoldChanges(sdxd, BPPARAM=multicore)
Error in estimateExonFoldChanges(sdxd, BPPARAM = multicore) :
  The value of the parameter fitExpToVar,'condition', is not a column name of colData

Is the fitExpToVar parameter set to condition by default? After changing renaming Type to condition, it worked. But wouldn't it be better to not have the default?

Thx

Mathias

dexseq intron retention prepare annotation • 2.7k views

ADD COMMENT • link 10.0 years ago mat.lesche ▴ 110

0

Entering edit mode

HI Mathias,

You could also specify fitExpToVar="Type" when calling estimateExonFoldChanges instead of renaming Type to condition.

Alejandro

ADD REPLY • link 10.0 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Hey Alejandro,

ok thx, I'll do that next time. One question if one has more than one factor like Type and Sequencing, as it is described in the vignette. Which one should one use for fitExpToVar?

ADD REPLY • link 10.0 years ago mat.lesche ▴ 110

score 0 · Answer 1 · 2015-07-08

I found this thread on seqanswers (http://seqanswers.com/forums/showthread.php?t=42420) which confirms my approach about the selection of introns. The proposed script works on the transcript gff while which is created by the python script, included in the DEXSeq package, and extracts introns which don't overlap any exons. But it leaves the exons and introns in the gff file. Now I was wondering if it is a good approach? Does DEXSeq do independent Filtering like DESeq? Because then, most of the introns should be filtered out.

I ran DEXSeq without the exon counts, only intron counts and DEXSeq ran through and I got some introns with a padj < 0.1.

In this thread (http://grokbase.com/t/r/bioconductor/13256zbvyk/bioc-detecting-differential-usage-of-introns-from-rna-seq-data) it is recommended to use estimateDispersionsTRT and testForDEUTRT if introns and exons are included. So I guess this doesn't apply if I only have the introns in my data set?

And my last question would be about the samples. I have excluded samples for my DESeq comparison because they are outliers. They just show a very high expression of genes which are lowly expressed in its own group. Or genes are expressed in the outliers but not in the other samples. Is it recommended to remove the sample when working with DEXSeq?

Thx

Mathias