Interaction effect sizes from DEXseq
1
0
Entering edit mode
stuart ▴ 10
@stuart-9561
Last seen 3.4 years ago
Australia

Dear all,

I have a multifactorial experimental design (3 mutants and 2 treatments) am trying to obtain the difference of differences for exon usage. I can obtain p-values but not the fold-change of expression that is attributable to the interaction between mutant and treatment. I hope this makes sense. Here is an equivalent problem with artificial count matrix c_mat (500 genes, 2 exons per gene, 2 replicates per mutant/treatment combination, no significant differences).

library(DEXSeq)
set.seed(1)
c_mat <- matrix(rpois(n = 12000, lambda = rep(rgamma(100,1)*100, 12)), ncol = 12)
sampleData=data.frame(row.names = colnames(c_mat), 
                      mut=rep(c('mutA','mutB','WT'), each=4),
                      treat=rep(c('trt','ctrl'),each=2))
colnames(c_mat)=paste(sampleData$mut, sampleData$treat, rep(1:2,6), sep = '_')
groupID=rep(paste0('gene_',1:500), each=2)
featureID=rep(c('exon1','exon2'),500)
rownames(c_mat) <- paste0(groupID,featureID)

design <- formula( ~ sample + exon + mut:exon + treat:exon + mut:treat:exon ) 
redMod <- formula( ~ sample + exon + mut:exon + treat:exon )

dxd <-  DEXSeqDataSet( c_mat, sampleData, design,
                         featureID, groupID )
dxd = estimateSizeFactors( dxd )
dxd = estimateDispersions( dxd )
dxdTEST1 = testForDEU( dxd, fullModel = design, reducedModel = redMod)
dxdTEST1 <- estimateExonFoldChanges( dxdTEST1, fitExpToVar = 'mut' )
dxrTEST1 = DEXSeqResults( dxdTEST1 )
dxrTEST1_df<-as.data.frame( dxrTEST1 )

So that gives a count matrix with the following column names of 3 mutants and 2 treatments, 2 replicates each:

> colnames(c_mat)
 [1] "mutA_trt_1"  "mutA_trt_2"  "mutA_ctrl_1" "mutA_ctrl_2" "mutB_trt_1"  "mutB_trt_2"  
 [7] "mutB_ctrl_1" "mutB_ctrl_2" "WT_trt_1"    "WT_trt_2"    "WT_ctrl_1"   "WT_ctrl_2"

Above code attempts to test a full model containing mut:treat:exon against a reduced model lacking this term. But I guess I would like to also run estimateExonFoldChanges with fitExpToVar = 'mut:treat:exon', however it obviously only accepts colnames of sampleData as values for fitExpToVar so it only extracted log2fold changes of the primary effect of 'mut' in this case:

> head(dxrTEST1_df)
             groupID featureID exonBaseMean  dispersion      stat    pvalue      padj
gene_1:exon1  gene_1     exon1     14.83689 0.004420707 2.3227484 0.3130557 0.9864358
gene_1:exon2  gene_1     exon2    184.90061 0.003685373 2.3671791 0.3061777 0.9864358
gene_2:exon1  gene_2     exon1    185.42570 0.015539932 0.2054466 0.9023766 0.9864358
gene_2:exon2  gene_2     exon2     79.05609 0.015624023 0.2048190 0.9026598 0.9864358
gene_3:exon1  gene_3     exon1    126.16461 0.001680639 2.3476782 0.3091777 0.9864358
gene_3:exon2  gene_3     exon2    118.44545 0.001682963 2.3471523 0.3092590 0.9864358
                  mutA      mutB        WT log2fold_mutB_mutA log2fold_WT_mutA genomicData
gene_1:exon1  6.897828  7.442706  7.656828         0.22690889      0.311865556            
gene_1:exon2 22.540960 22.372108 22.309067        -0.02860662     -0.039317188            
gene_2:exon1 22.494594 22.285767 22.511180        -0.03543656      0.002806783            
gene_2:exon2 15.806041 16.032684 15.788190         0.04786188     -0.003790942            
gene_3:exon1 19.402213 19.269096 19.196305        -0.02463073     -0.038144879            
gene_3:exon2 18.685657 18.823240 18.899005         0.02595844      0.040201064            
             countData.1 countData.2 countData.3 countData.4 countData.5 countData.6
gene_1:exon1          13          13          10          16          19          20
gene_1:exon2         183         176         194         196         180         192
gene_2:exon1         186         196         164         201         163         220
gene_2:exon2          81          73          82          77          90          84
gene_3:exon1         140         132         121         126         128         118
gene_3:exon2         124         111         114         124         117         111
             countData.7 countData.8 countData.9 countData.10 countData.11 countData.12
gene_1:exon1           9          14          13           18           17           17
gene_1:exon2         194         182         196          162          168          208
gene_2:exon1         166         181         176          210          193          181
gene_2:exon2          81          69          95           64           80           78
gene_3:exon1         121         132         109          127          137          131
gene_3:exon2         118         125         126          123          104          132

Is there some way to extract the estimate of the interaction effect size? FYI running R version 3.4.1, DEXSeq_1.22.0, DESeq2_1.16.1

Thanks a lot in advance,

Stuart

(edited to clarify that I set estimateExonFoldChanges to report primary effect of mut here as a placeholder)

dexseq multiple factor design • 1.2k views
ADD COMMENT
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 5 months ago
Novartis Institutes for BioMedical Rese…

Hi Stuart,
Thanks for your detailed post. Getting the "exon usage" fold change of an interaction term is currently not supported. However, you can visualize the effects of the interaction test by concatenating your two columns (mut and treat) in a single column and using the new column for estimateExonFoldChanges. Something like:

​​colData( dxdTEST1 )$newCol <-
  factor(paste( colData( dxdTEST1 )$mut, colData( dxdTEST1 )$treat, sep="-"))
dxdTEST1 <- estimateExonFoldChanges( dxdTEST1, fitExpToVar = 'newCol' )

Alejandro

ADD COMMENT
0
Entering edit mode

Hi Alejandro,

Thanks a lot, that's really helpful.

Just a related question on the results object, as I want to make sure I genuinely understand how the log2fold-change is calculated: if I set gene_1:exon1 to be upregulated ~10-fold specifically in mut=='A' and treat='trt' by inserting some higher counts in these cells: c_mat[1,1:2]=c(138, 145) and follow your advice above to get a results object, it gives a log2fold change (mutA, trt vs ctrl) of 3.058975:

> dxrTEST1_df[1:2,1:14]

             groupID featureID exonBaseMean  dispersion     stat       pvalue         padj
gene_1:exon1  gene_1     exon1     36.17235 0.002840577 61.77129 3.859537e-14 1.929768e-11
gene_1:exon2  gene_1     exon2    184.92823 0.002647820 62.21218 3.095964e-14 1.929768e-11
             mutA.ctrl mutA.trt mutB.ctrl  mutB.trt   WT.ctrl    WT.trt
gene_1:exon1  6.976913 18.33195  6.731423  8.410945  7.915111  7.775273
gene_1:exon2 23.097893 20.18264 23.203428 22.547120 22.726225 22.778745
             log2fold_mutA.trt_mutA.ctrl
gene_1:exon1                    3.058975
gene_1:exon2                   -0.504468

Do the values given under 'mutA.ctrl' column represent exon usage (normalized to gene expression change) or exon expression? Have they been transformed somehow (VST)? Most importantly I could not understand how the 4 expression values in gene1 mutA (exon1 & 2; trt & ctrl) are combined to arrive at 3.058975, could you shed some light on it? Sorry if this question is somewhat vague, I have looked in the DEXseq docs and paper but I am not sure I am quite grasping how it is done.

Thanks very much

ADD REPLY
0
Entering edit mode

Hi Stuart. The four values would correspond to exon usage values after applying a variance stabilizing transformation. The exon fold changes are estimated from these values but before being variance-stabilized transformed, this is the reason why they can't be estimated from the four exon usage values. 

ADD REPLY
0
Entering edit mode

Ah, I see now, thanks for the clarification. Actually I just realised that mcols(dxrTEST1)$description gives some of this information, but it's useful to know where and when VST is applied too.

ADD REPLY

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6