Was wondering if somebody could help me with an issue I am having with DEXSeq. The issue is that I am getting basically 100,s of thousands of exons being deferentially used in a time course experiment I am carrying out and was wondering if somebody that has experience with this could help me/look over my code. I have about 100 samples but have cut down to organs in which I have at least 2 replicates (so far) per time point. The time points are p3, p15, p30 and p60 with an additional 4 time points coming from sequencing results in the near future.
The aim of this question I have is as follows: Is there a difference in exon usage in each organ going from p3 up to p60. After this, can I identify exons that are deferentially used across all organs or a subset of organs?
I have no trouble doing this analysis for differential gene expresion across time but I think I have become a bit confused, especially with the sample data with DEXSeq: The code is as follows:
hse <- makeTxDbFromBiomart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
exonsByGene <- exonsBy(hse, by="gene")
exonicParts = disjointExons( hse, aggregateGenes=FALSE )
bamDir<- file.path(file= "Amir Raw RNA-sq data/")
fls <-list.files( bamDir, pattern="bam$", full=TRUE )
bamlst = BamFileList( fls, index=character(), yieldSize=100000,
SE = summarizeOverlaps( exonicParts, bamlst, mode="Union",
singleEnd=FALSE, ignore.strand=TRUE, inter.feature=FALSE,
I am then pulling the sample data from columns in my metadata sheet name sampledataalltimepoints:
diffexon<-DEXSeqDataSetFromSE( SE ,design= ~condition+Organ+exon )
head( counts(diffexon), 5 )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
ENSMUSG00000000001:E001 459 1992 552 2052 1028 654 907 1576
ENSMUSG00000000001:E002 51 394 90 297 173 110 154 391
ENSMUSG00000000001:E003 126 393 99 333 203 118 144 374
ENSMUSG00000000001:E004 139 365 110 305 179 110 139 340
ENSMUSG00000000001:E005 212 439 121 403 206 152 199 418.... etc
diffexon = estimateSizeFactors( diffexon )
dxd<-estimateDispersions(diffexon ).... Very time consuming!
diffexonusage = testForDEU( dxd, fullModel = ~condition+Organ+exon, reducedModel = ~Organ, BPPARAM = BPPARAM)
dxd = estimateExonFoldChanges(diffexonusage , fitExpToVar ="condition", BPPARAM = BPPARAM, independentFiltering = F)
dxr1 = DEXSeqResults( dxd )
DataFrame with 452908 rows and 17 columns
ENSMUSG00000000001:E001 ENSMUSG00000000001 E001
ENSMUSG00000000001:E002 ENSMUSG00000000001 E002
ENSMUSG00000000001:E003 ENSMUSG00000000001 E003
ENSMUSG00000000001:E004 ENSMUSG00000000001 E004
ENSMUSG00000000001:E005 ENSMUSG00000000001 E005
... ... ...
ENSMUSG00000116527:E011 ENSMUSG00000116527 E011
ENSMUSG00000116527:E012 ENSMUSG00000116527 E012
ENSMUSG00000116527:E013 ENSMUSG00000116527 E013
ENSMUSG00000116527:E014 ENSMUSG00000116527 E014
ENSMUSG00000116528:E001 ENSMUSG00000116528 E001
exonBaseMean dispersion stat
<numeric> <numeric> <numeric>
ENSMUSG00000000001:E001 657.8843 0.10376174 14.42773
ENSMUSG00000000001:E002 106.5895 0.13580335 972.78567
ENSMUSG00000000001:E003 112.4083 0.09943998 1267.37422
ENSMUSG00000000001:E004 105.6330 0.09874353 1331.44409
ENSMUSG00000000001:E005 134.5156 0.09485669 1153.30200... etc
By making the reduced model organ, am I not lookinf for the effects of time and exon usage?
This seems very vert wrong to me given the massive amount of differentially expressed exons.. and MA plot essentially is a splash of red for significant genes across the whole plot. Is the design I am using wrong, again, to test for differential exon usage across time in each organ? any help would be hugely helpful as this is an important analysis for me. Many thanks in advance!