Hi all
I am using DEXSeq to analyse genome wide exon usage in RNA-Seq sample from maize. I have two factors data. One is condition (weedy/weed-free) and thiamethoxam treatment (treated/untreated). Each has 3 replicates. I wanted to see the differential exon usage for weedy versus weed-free, treated versus untreated and for interaction between condition and treatment. Here is my sample table :
condition treatment replicate
WC_RepI weedy untreated 1
WC_RepII weedy untreated 2
WC_RepIII weedy untreated 3
WCT_RepI weedy treated 1
WCT_RepII weedy treated 2
WCT_RepIII weedy treated 3
WFC_RepI weed-free untreated 1
WFC_RepII weed-free untreated 2
WFC_RepIII weed-free untreated 3
WFT_RepI weed-free treated 1
WFT_RepII weed-free treated 2
WFT_RepIII weed-free treated 3
The DEXSeq dataset is
dxd1= DEXSeqDataSetFromHTSeq(
countFiles,
sampleData=SampleTable,
design=~sample+exon+treatment:exon+condition:exon,
flattenedfile=annotfile)
The formula that I used for all 3 analysis;
For weedy vs weed-free,
formulaFullModel=~sample+exon+treatment:exon+condition:exon
formulaReducedModel=~sample+exon+treatment:exon
dxd_e1=estimateDispersions(dxd1,formula=formulaFullModel)
dxd_t1=testForDEU(dxd_e1,reducedModel=formulaReducedModel,fullModel=formulaFullModel)
dxd_fc1=estimateExonFoldChanges(dxd_t1,fitExpToVar = "condition")
dxr3=DEXSeqResults(dxd_fc1)
For treated vs untreated,
formulaFullModel_t=~sample+exon+condition:exon+treatment:exon
formulaReducedModel_t=~sample+exon+condition:exon
dxd_e2=estimateDispersions(dxd1,formula=formulaFullModel_t)
dxd_t2=testForDEU(dxd_e2,reducedModel=formulaReducedModel_t,fullModel=formulaFullModel_t)
dxd_fc2=estimateExonFoldChanges(dxd_t2,fitExpToVar = "treatment")
dxr2=DEXSeqResults(dxd_fc2)
For interaction,
colData(dxd1)$mix<-paste0(colData(dxd1)$condition,colData(dxd1)$treatment)
formulaFullModel_i=~sample+exon+condition:exon+treatment:exon+condition:treatment:exon
formulaReducedModel_i=~sample+exon+condition:exon+treatment:exon
dxd_e3=estimateDispersions(dxd1,formula=formulaFullModel_i)
dxd_t3=testForDEU(dxd_e3,reducedModel=formulaReducedModel_i,fullModel=formulaFullModel_i)
dxd_fc3=estimateExonFoldChanges(dxd_t3,fitExpToVar = "mix")
dxr3=DEXSeqResults(dxd_fc3)
For first two analysis I got same result with no significant hits for padj<0.1
While, for interaction I got 1 significant hit at padj<0.1.
I am just wondering if such results can be expected. Did I do wrong somewhere in the steps?
I will really appreciate your suggestions.
My session info :
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] genefilter_1.54.2 DEXSeq_1.18.4
[3] RColorBrewer_1.1-2 AnnotationDbi_1.34.4
[5] DESeq2_1.12.3 SummarizedExperiment_1.2.3
[7] GenomicRanges_1.24.2 BiocInstaller_1.22.3
[9] GenomeInfoDb_1.8.3 IRanges_2.6.1
[11] S4Vectors_0.10.2 Biobase_2.32.0
[13] BiocGenerics_0.18.0 BiocParallel_1.6.3
Thanks!
Hi Alejandro,
Thank you for your reply. Yes, all the samples are paired. I will try to do as you suggest. I had introduced as "-p yes" while counting reads. I haven't done PCA or clustering yet. However, this data was used to identify differentially expressed gene using EdgeR, and showed hundreds of DE genes between the conditions/ treatment.
Thanks!
Hi again,
I tried to introduce libtype "paired" in design, I got an error
Error in DESeqDataSet(rse, design, ignoreRank = TRUE) :
design contains one or more variables with all samples having the same value,
remove these variables from the design
Pabita
Hi Pabita,
Could you include your model formulas?
Alejandro
Hi Alejandro,
I am sorry, I did not understand what you exactly mean by " include your model formulas". I tried the following code then I got an error.
SampleTable = data.frame(
row.names = c("WC_RepI","WC_RepII","WC_RepIII","WCT_RepI","WCT_RepII","WCT_RepIII","WFC_RepI","WFC_RepII","WFC_RepIII","WFT_RepI","WFT_RepII","WFT_RepIII"),
condition=c("weedy","weedy","weedy","weedy","weedy","weedy","weed-free","weed-free","weed-free","weed-free","weed-free","weed-free"),
treatment=c("untreated","untreated","untreated","treated","treated","treated","untreated","untreated","untreated","treated","treated","treated"),
libtype=c(rep("paired",12)),
replicate =c("1","2","3","1","2","3","1","2","3","1","2","3")
)
suppressPackageStartupMessages(library("DEXSeq"))
dxd1= DEXSeqDataSetFromHTSeq(
countFiles,
sampleData=SampleTable,
design=~sample+exon+libtype:exon+treatment:exon+condition:exon,
flattenedfile=annotfile)
One more question. I am confused how the exon usage coefficients are calculated. I read the post A: DEXSeq: question about usage coefficient and splicing plot , but still did not get what are the inputs for calculating it? And how the gene expression is deducted from overall expression leaving only exon usage effect?
Thanks!
Hi Pabita,
Sorry I was not clear. Since your replicates are paired, it is possible that you have replicate-to-replicate variation that you can account for if you add this information to the GLMs. I mean something like this:
The details of the exon usage coefficients can be found in detail in the DEXSeq paper: http://genome.cshlp.org/content/22/10/2008.long.
Alejandro
Hi Alejandro,
Thank you for your suggestion.
Pabita
Hi Alejandro,
I tried with the models as you suggested, but there is no change in the result.
Pabita