Different number of exons in reference genome affects DEXSeq analysis
1
0
Entering edit mode
Alessia • 0
@1b283aa4
Last seen 1 day ago
Spain

I am reproducing a differential exon usage analysis using DEXSeq that was done on a previous version of the human reference genome (not sure which) with respect to the one I am using now (ensembl release 111).

The previous version reports 36 exons for my gene of interest (myo6), while the one I am using reports 73 exons. The exons I am looking for in silico were previously validated in the lab.

I am performing a DEXSeq standard analysis and I have 3 conditions, where 1 is the control and the other 2 are treatments. I performed the dexseq complete pipeline separating the two cases:

  1. treatment 1 vs control
  2. treatment 2 vs control And the output is then composed by two tables. However, the results are no longer significant for the exons of interest, while they were in the previous analysis (with the reference genome containing less exons).

Then I tried performing the dexseq analysis by creating a unique model using all 3 conditions and specifying in the fold change computation denominator = control. In this way the exons of interest are significant again. However, this method outputs 30k significant events versus the 500 outputted by the previous method.

I am wondering whether the different number of exons in the reference affects the DEU analysis of dexseq and if there is any possible solutions.

DEXSeq • 144 views
ADD COMMENT
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 7 hours ago
Novartis Institutes for BioMedical Reseā€¦

Likely the new annotation that you are using has more annotated isoforms, and thus the exons are splitted into more disjoint exonic bins. I'd expect some differences in the output, but they should not be completely discordant. Something I'd check is whether you are using the same full and reduced models that were used in the old analysis.

ADD COMMENT
0
Entering edit mode

Hi Alejandro, thank you for you reply. I have been using the same full and reduced models. After some research I discovered that using Ensembl referrence genome is not the most suitable choice, it is best to use RefSeq sequence. In a practical sense, I think the biggest difference between RefSeq and Ensembl/GENCODE is in the sensitivity/specificity trade off. Ensembl aims more towards the inclusive end, including a far larger number of transcript variants, many of which are only weakly supported.

ADD REPLY
0
Entering edit mode

Hi Alessia. Makes sense. Something I use often are the support levels from ENSEMBL and the annotation of principal isoforms. These substantially reduce the number of low-confidence transcript isoforms.

ADD REPLY

Login before adding your answer.

Traffic: 540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6