I am comparing differential gene expression (using glmQLFTest) and differential splicing (using diffSpliceDGE). While I have been able to test my contrasts of interest for gene expression, I am having a hard time doing the same for splicing.
I have designed contrast matrices to test my comparisons of interest: the effects of history (Ext. v. Cor) and chemistry (Ast. v. Ran. v. Ros) within four different tissues (Fat, Lab, Gut, Mal). I have created a large design matrix:
> design.all <- model.matrix(~0 + group, data = VC_metadata) # interaction effects
> colnames(design.all) <- levels(VC_metadata$group)
> colnames(design.all)
# [1] "Fat.Cor.Ast" "Fat.Cor.Ros" "Fat.Ext.Ast" "Fat.Ext.Ran"
# [5] "Gut.Cor.Ast" "Gut.Cor.Ros" "Gut.Ext.Ast" "Gut.Ext.Ran"
# [9] "Lab.Cor.Ast" "Lab.Cor.Ros" "Lab.Ext.Ast" "Lab.Ext.Ran"
# [13] "Mal.Cor.Ast" "Mal.Cor.Ros" "Mal.Ext.Ast" "Mal.Ext.Ran"
Focusing on the "Fat" tissue, I created three contrast matrices targeting my contrasts of interest: Main effect of history
> FB_Main.F <- makeContrasts((Fat.Cor.Ast + Fat.Cor.Ros)/2 - (Fat.Ext.Ast + Fat.Ext.Ran)/2, levels = design.all)
> colnames(FB_Main.F) <- "FB_Main.F"
> rownames(FB_Main.F) <- levels(VC_metadata$group)
> colSums(FB_Main.F)
FB_Main.F
0
> sum(FB_Main.F[which(FB_Main.F > 0),])
[1] 1
> sum(FB_Main.F[which(FB_Main.F < 0),])
[1] -1
Main effect of chemistry
> HPcontrasts <- rbind(
+ c(0.5,1)/2,
+ c(0.5, -1),
+ c(0.5,1)/2,
+ c(-1,0))
> nullcontrasts <- rbind(c(0,0), c(0,0), c(0,0), c(0,0))
> FB_Main.HP <- rbind(
+ HPcontrasts,
+ nullcontrasts,
+ nullcontrasts,
+ nullcontrasts)
> rownames(FB_Main.HP) <- levels(VC_metadata$group)
> colnames(FB_Main.HP) <- paste("C",1:2,sep="_")
> crossprod(FB_Main.HP)
C_1 C_2
C_1 1.375 -0.25
C_2 -0.250 1.50
> colSums(FB_Main.HP)
C_1 C_2
0 0
> for (ct in 1:2){
+ print(sum(FB_Main.HP[which(FB_Main.HP[,ct] > 0),ct]))
+ print(sum(FB_Main.HP[which(FB_Main.HP[,ct] < 0),ct]))
+ }
[1] 1
[1] -1
[1] 1
[1] -1
History x chemistry interaction
> FxHPcontrasts <- cbind(
+ c(rep(1/3,3),-1),
+ c(rep(1/2,2),-1,0),
+ c(rep(1/1,1),-1,0,0))
> nullcontrast2 <- rbind(c(rep(0,3)), c(rep(0,3)), c(rep(0,3)), c(rep(0,3)))
> FB_Int.FxHP <- rbind(FxHPcontrasts,
+ nullcontrast2,
+ nullcontrast2,
+ nullcontrast2)
> rownames(FB_Int.FxHP) <- levels(VC_metadata$group)
> colnames(FB_Int.FxHP) <- paste("C",1:3,sep="_")
> crossprod(FB_Int.FxHP)
C_1 C_2 C_3
C_1 1.333333 0.0 0
C_2 0.000000 1.5 0
C_3 0.000000 0.0 2
> colSums(FB_Int.FxHP)
C_1 C_2 C_3
-5.551115e-17 0.000000e+00 0.000000e+00
> for (ct in 1:3){
+ print(sum(FB_Int.FxHP[which(FB_Int.FxHP[,ct] > 0),ct]))
+ print(sum(FB_Int.FxHP[which(FB_Int.FxHP[,ct] < 0),ct]))
+ }
[1] 1
[1] -1
[1] 1
[1] -1
[1] 1
[1] -1
I was able to effectively use these contrasts with the glmQLFTest command:
> fit.genes.filt.all <- glmQLFit(VC.genes.filt.all,design.all, robust=TRUE)
> modelQLF <- list()
> modelQLF$FB_Main.F <- glmQLFTest(fit.genes.filt.all, contrast = FB_Main.F)
> modelQLF$FB_Main.HP <- glmQLFTest(fit.genes.filt.all, contrast = FB_Main.HP)
> modelQLF$FB_Int.FxHP <- glmQLFTest(fit.genes.filt.all, contrast = FB_Int.FxHP)
> head(topTags(modelQLF$FB_Main.HP, 10))
# (chr, start, end, strand, length excluded for space)
Coefficient: LR test on 2 degrees of freedom
GeneID logFC.C_1 logFC.C_2 logCPM F PValue FDR
VCARD_00002726-RA -5.8067313 -2.165252532 1.80375345 65.45306 1.049847e-15 1.108743e-11
VCARD_00007039-RA -4.8165992 0.009733779 -0.10870504 49.90271 2.971616e-13 1.569162e-09
VCARD_00003307-RA -5.2875728 1.523010670 -0.02549543 47.68861 1.167581e-12 3.186382e-09
VCARD_00009318-RA -3.2893327 -0.835207267 2.58535467 45.27999 1.206849e-12 3.186382e-09
VCARD_00012005-RA 0.6319163 -1.897763282 7.53346103 41.71435 5.101614e-12 1.077563e-08
VCARD_00012006-RA 0.5993499 -1.898216585 8.13562869 36.39604 5.038427e-11 8.868471e-08
Can I use the same contrast matrices within the diffSpliceDGE command? I know that I can designate coefficients withing diffSpliceDGE, but when I use the same structure as above, I get the following error for both the chemistry and interaction comparison: e.g.
> fit.exons.filt.all <- glmQLFit(VC.exons.filt.all,design.all, robust=TRUE)
...
> modeldiffSplice$FB_Main.HP <- diffSpliceDGE(fit.exons.filt.all, contrast = FB_Main.HP, geneid="GeneID", exonid="ExonID")
Total number of exons: 64211
Total number of genes: 10269
Number of genes with 1 exon: 1713
Mean number of exons in a gene: 6
Max number of exons in a gene: 134
Error in diffSpliceDGE(fit.exons.filt.all, contrast = FB_Main.HP, geneid = "GeneID", :
dims [product 62498] do not match the length of object [124996]
I appreciate any suggestions on how I can best move forward with my analysis. Thank you, Rachel
For clarification, does this mean that diffSpliceDGE can only handle pairwise contrasts coded as a single vector, and that I should reorganize my design so that I can instead specify coefficients?
Sorry, I don't understand what is concerning you. diffSpliceDGE can accept any contrast at all, there is no restriction. There is no reason why you would need to reorganize the design matrix. diffSpliceDGE explores all differential exon usage possibilities for a single comparison, and it is exactly the same whether you specify the comparison via coefficients or contrasts. diffSpliceDGE allows you to specify exactly one coefficient or exactly one contrast.
I don't understand why you are trying to give more than one contrast the diffSpliceDGE. To me it wouldn't make any sense to do so.
I would like to know whether I can use diffspliceDGE to identify genes that are deferentially spliced among three or more levels of my variable of interest, for example, levels A, B and C. I think what you are saying is yes, but only if I test AvB, BvC, and CvA. Is that correct? Can you suggest any resources in addition to the manual and the help page that describe how diffspliceDGE functions as a t-test? The manual (pages 27, 86-88) and the help page do not mention t-tests, and instead only refer to the F-tests.
Yes, if you have more than two groups to compare and you want to explore alternative splicing between any of them, then you would need to run diffSpliceDGE more than once with different contrasts. The contrasts don't necessarily have to be pairwise comparisons however.
I agree that my remark about t-tests was misleading, so I've edited my previous comment to remove it. What I meant was that diffSpliceDGE conducts tests with have a single numerator degree of freedom, like a t-test. If you run diffSpliceDGE on a glmQL fit object, then diffSpliceDGE will conduct exon-level F-tests where the numerator df is 1, and these F-statistics can be interpreted as squared t-statistics. The analogous function diffSplice in the limma package really does conduct t-tests.
I have no plans to allow you to input multiple contrasts to diffSpliceDGE because I think the results would be too difficult to interpret. Looking for any possible different splicing event between any pair of groups would be too confusing. When you got a signficant result, it could be difficult to track down what the specific alternative splicing pattern was.
Thanks so much for the clarification.
The weird thing is, I have been able to use diffspliceDGE with multiple coefficients. For example, I tried to compare splicing differences among families using the following design matrix.
From your explanation above, diffSpliceDGE should not be able to handle the full comparison. The help page suggests it should default to the last column. But, the resulting diffSpliceDGE object has six columns of coefficients, with only a single vector of p-values at both the exon and gene level.
The p-values are not the same as if diffSpliceDGE had just reverted to test the pairwise comparison of coefficient 7 with the intercept:
I also tested whether releveling families had any effect on the output, and found that while the coefficients change relative to the base-level family (here family 37), the exon-level p-values do not change, nor do the genes identified as differentially spliced by topSpliceDGE.
I am sorry to flood you with so much information, and I trust you when you say that diffSpliceDGE has not been designed to accommodate comparing more than two levels, but I am still confused about how I should interpret this output.
diffSpliceDGE() shouldn't be accepting more than one coefficient. We will add a warning to the function to prevent this misunderstanding in the future.
I have checked with Yunshun Chen, who is the author of the function. He says you should ignore the results you have now and rerun with
coef
set to a single value.