DEXSeq time course design
1
0
Entering edit mode
A ▴ 40
@a-14337
Last seen 2 days ago
United Kingdom

Hi all,

Was wondering if somebody could help me with an issue I am having with DEXSeq. The issue is that I am getting basically 100,s of thousands of exons being deferentially used in a time course experiment I am carrying out and was wondering if somebody that has experience with this could help me/look over my code. I have about 100 samples but have cut down to organs in which I have at least 2 replicates (so far) per time point. The time points are p3, p15, p30 and p60 with an additional 4 time points coming from sequencing results in the near future.

The aim of this question I have is as follows: Is there a difference in exon usage in each organ going from p3 up to p60. After this, can I identify exons that are deferentially used across all organs or a subset of organs?

I have no trouble doing this analysis for differential gene expresion across time but I think I have become a bit confused, especially with the sample data with DEXSeq: The code is as follows:

Prepare annotations

hse <- makeTxDbFromBiomart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
exonsByGene <- exonsBy(hse, by="gene")

exonicParts = disjointExons( hse, aggregateGenes=FALSE )

bamDir<- file.path(file= "Amir Raw RNA-sq data/")
fls <-list.files( bamDir, pattern="bam$", full=TRUE ) bamlst = BamFileList( fls, index=character(), yieldSize=100000, obeyQname=TRUE ) SE = summarizeOverlaps( exonicParts, bamlst, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, inter.feature=FALSE, fragments=TRUE ) I am then pulling the sample data from columns in my metadata sheet name sampledataalltimepoints: colData(SE)$condition=Sampledataalltimepoints$Time_point colData(SE)$Organ=Sampledataalltimepoints$Organ colData(SE) SE$condition<-as.factor(SE$condition) SE$Organ<-as.factor(SE$Organ) colData(SE)$replicate=Sampledataalltimepoints$Replicate SE$replicate<-NULL

diffexon<-DEXSeqDataSetFromSE( SE ,design= ~condition+Organ+exon )

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
ENSMUSG00000000001:E001  459 1992  552 2052 1028  654  907 1576
ENSMUSG00000000001:E002   51  394   90  297  173  110  154  391
ENSMUSG00000000001:E003  126  393   99  333  203  118  144  374
ENSMUSG00000000001:E004  139  365  110  305  179  110  139  340
ENSMUSG00000000001:E005  212  439  121  403  206  152  199  418.... etc

diffexon = estimateSizeFactors( diffexon  )
dxd<-estimateDispersions(diffexon ).... Very time consuming!
plotDispEsts(dxd

diffexonusage = testForDEU( dxd, fullModel =  ~condition+Organ+exon, reducedModel = ~Organ, BPPARAM = BPPARAM)
DEXSeqResults(diffexonusage )

dxd = estimateExonFoldChanges(diffexonusage , fitExpToVar ="condition", BPPARAM = BPPARAM, independentFiltering = F)

dxr1 = DEXSeqResults( dxd )

DataFrame with 452908 rows and 17 columns
groupID   featureID
<character> <character>
ENSMUSG00000000001:E001 ENSMUSG00000000001        E001
ENSMUSG00000000001:E002 ENSMUSG00000000001        E002
ENSMUSG00000000001:E003 ENSMUSG00000000001        E003
ENSMUSG00000000001:E004 ENSMUSG00000000001        E004
ENSMUSG00000000001:E005 ENSMUSG00000000001        E005
...                                    ...         ...
ENSMUSG00000116527:E011 ENSMUSG00000116527        E011
ENSMUSG00000116527:E012 ENSMUSG00000116527        E012
ENSMUSG00000116527:E013 ENSMUSG00000116527        E013
ENSMUSG00000116527:E014 ENSMUSG00000116527        E014
ENSMUSG00000116528:E001 ENSMUSG00000116528        E001
exonBaseMean dispersion       stat
<numeric>  <numeric>  <numeric>
ENSMUSG00000000001:E001     657.8843 0.10376174   14.42773
ENSMUSG00000000001:E002     106.5895 0.13580335  972.78567
ENSMUSG00000000001:E003     112.4083 0.09943998 1267.37422
ENSMUSG00000000001:E004     105.6330 0.09874353 1331.44409
ENSMUSG00000000001:E005     134.5156 0.09485669 1153.30200... etc

By making the reduced model organ, am I not lookinf for the effects of time and exon usage?

This seems very vert wrong to me given the massive amount of differentially expressed exons.. and MA plot essentially is a splash of red for significant genes across the whole plot. Is the design I am using wrong, again, to test for differential exon usage across time in each organ? any help would be hugely helpful as this is an important analysis for me. Many thanks in advance!

DEXSeq differential exon usage • 596 views
1
Entering edit mode
Alejandro Reyes ★ 1.8k
@alejandro-reyes-5124
Last seen 15 days ago
Novartis Institutes for BioMedical Rese…

Hi,

I think your reasoning of limiting only to samples with replicates is a great one.

The formulae that you are specifying might not be the correct ones for the hypothesis that you want to test.

For the question "Is there a difference in exon usage in each organ going from p3 up to p60", I think you want to subset your DEXSeqDataSet object containing p3 and p60 for each organ and use the following formulae:

formulaFullModel = ~ sample + exon + condition:exon formulaReducedModel = ~ sample + exon

For the question "can I identify exons that are deferentially used across all organs?" I think you want:

formulaFullModel = ~ sample + exon + Organ:exon + condition:exon formulaReducedModel = ~ sample + exon + Organ:exon

For the question "can I identify exons that are deferentially used on a subset of organs?", I think we can rephrase this question to set up an interaction effect "can I identify exons that are deferentially used dependent on the organ?". And to answer this question, you could use the following formulae:

formulaFullModel = ~ sample + exon + Organ:exon + condition:exon + Organ:condition:exon formulaReducedModel = ~ sample + exon + Organ:exon + condition:exon

​Alejandro

0
Entering edit mode

Hi Alejandro,

Thank you so much for the quick reply! Really appreciate this. Will test these out. A few quick follow up questions:

When you write 'sample' in your design formulae, may I ask what you are referring to? The sampledata I have are age and organ (7 organs in total) 4 ages. I would like to test the effects of time on exon usage in the different organs...

Many thanks again!

With regards to subsetting p3 and p60 organs, I would also like to see if there is a progressive change in exon usage inlcuding at p15 and p30, co could I use the same design to see how exons change across all time points without subsetting?

0
Entering edit mode

No problem! The "sample" column is normally a column in colData that is added by default when creating a DEXSeqDataSet object, if you don't have it you should add it. This is a term that accounts for sample-specific contributions, it also absorbs other effects such as changes in gene expression. More details can be found in the paper and the software vignette.

Yes, if you would like to see progressive changes in through the time points, the same formulae should work.

0
Entering edit mode

Awesome! Running those 3 formulae right now and will report. Thank you so much once again! and yes, the title 'sample' was indeed included after te DEXSeq object had been created, I just wasn't sure that I had to include it!

Fingers crossed :) have a great day

0
Entering edit mode

Hi Alejandro,

Thank you so much for your help above! These models have helped tremendously! They all work perfectly. I was wondering if I could ask a few follow up questions:

Is there a way in estimateexonfoldchanges function to to specify both "condition" and "organ" in order to be able to plot differential exon usage for the individual organs over time. The problem at the moment is that with the first model, The plotDEXSeq function produced a plot of DE usage from p3 up to p60, however, does not specify which organ this belongs to. Are the values the average of all organs for a given exon?

It would be nice to see how they separate individually and then take all the expression data to form patterns of exons that change commonly across all organs as this isn't explicitly clear...

Many thanks!