Dear Bioconductor Community,
i have recently preprocessed and analyzed an illumina microarray dataset in R both with lumi and limma package, but my main question stands for the appropriate implementation of statistical inference. In detail, my experiment tests the use of different PI3K inhibitors on metastatic T47D breast cancer cells. My pheno data object is the below:
targets2
Sample.Group replicate
1 T47D_DMSO 1
2 T47D_DMSO 2
3 T47D_OHT 1
4 T47D_OHT 2
5 T47D_GDC 1
6 T47D_GDC 2
7 T47D_C10 1
8 T47D_C10 2
9 T47D_GDC_OHT 1
10 T47D_GDC_OHT 2
11 T47D_C10_OHT 1
12 T47D_C10_OHT 2
where the basic control group is the DMSO, the other biological substances are various PI3K inhibitors, while the two last groups implement the possible synergistic action of the merging of dual inhibitors. Also, the coefficient "replicate" shows the different biological replicates(batch, the repeat of the experiment) but all the samples were processed together for the microarray experiment. Then i continued with the following code:
samples <- factor(targets2$Sample.Group, levels=c("T47D_DMSO", "T47D_OHT", "T47D_GDC", "T47D_C10", "T47D_GDC_OHT", "T47D_C10_OHT"))
batch <- factor(targets2$replicate)
design <- model.matrix(~samples + batch)
design
(Intercept) samplesT47D_OHT samplesT47D_GDC samplesT47D_C10 samplesT47D_GDC_OHT
1 1 0 0 0 0
2 1 0 0 0 0
3 1 1 0 0 0
4 1 1 0 0 0
5 1 0 1 0 0
6 1 0 1 0 0
7 1 0 0 1 0
8 1 0 0 1 0
9 1 0 0 0 1
10 1 0 0 0 1
11 1 0 0 0 0
12 1 0 0 0 0
samplesT47D_C10_OHT batch2
1 0 0
2 0 1
3 0 0
4 0 1
5 0 0
6 0 1
7 0 0
8 0 1
9 0 0
10 0 1
11 1 0
12 1 1
attr(,"assign")
[1] 0 1 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$samples
[1] "contr.treatment"
attr(,"contrasts")$batch
[1] "contr.treatment"
So, my first naive but important question is that after the construction of my design matrix, my model will test each biological substance or combination vs the control group (after "correcting" for the batch coefficient possible effect), or it is essentially an anova experiment ? thus, it test each combination versus the others ?
And if is the second case, should i proceed after the design matrix with functions makeContrasts & contrasts.fit in order to specify differences of higher importance and interest, such as the comparison of the last biological group (tamoxifen + Compound10) versus i.e. each substance alone to find different and more specific gene alterations,as the main goal of the specific experiment is to study any possible combinatorial effects of tamoxifen & PI3Kα inhibitors ?
Any suggestion, idea or correction would be valuable !! Thank you in advance for your time !!!
Dear Aaron,
your detailed answers are grateful and very useful. To summarize, in case i have misunderstood anything because the construction of the design matrix is very important:
1. Regarding my first question, you mention that (and correct me if it is something different) that based on the design matrix i have posted and due to the fact that my "baseline" level is the control group(DMSO), all the comparisons would be made between each biological group vs DMSO right ?
2. Secondly, regarding your very useful suggestions about the constrasts, which i will follow, we are mostly interested in analyzing the molecular basis of synergy between tamoxifen and GDC-0941 as also tamoxifen with a new potential PI3Ka inhibitor(compound 10). Thus, im wondering if it is more useful to go straight to the second implementation you proposed with contrasts fit instead of all the possible comparisons ? Or should i firstly use the decideTests() and then a Venn diagram with the simple first approach instead of the topTable (without the contrasts fit) to see possible intersections ? Thank you for your consideration on this matter !!
sample*
coefficient at a time.Dear Aaron,
please excuse me but what exactly do you mean by "drop one sample coefficient at a time" ? that is, the specific implementation of my design matrix which includes the intercept as the first reference group which is the control group, and the other biological groups ?
Moreover, because i would like to ask you if i dont include the intercept term, and proceed as below:
design <- model.matrix(~0 +samples + batch) :
the methodology and the results would be the same as far as i specify with makeContrasts my comparisons of interest ?
Or i need from the start to define without an intercept my design matrix, and then define my various comparisons ??
Thank you again for your patience and guidance !!
For your design with an intercept, you can drop one coefficient at a time by specifying
coef=
intopTable
. There's nothing wrong with your implementation of the design matrix, but you still need to specify what comparisons you want to make, and one way to do that is with thecoef
argument.If you use a design matrix without the intercept, you'll have to specify your comparisons with
makeContrasts
, rather than just dropping each coefficient. This is because each coefficient will represent the average expression for each of your treatment groups (DMSO, C10, OHT, etc.) so you'll need to explicitly specify a comparison between groups.Dear Aaron,
thank you again for your explanation about the topTable and the coef argument which specifies the specific comparison i want to extract each time !! Thus, i believe i could proceed firstly with my initial post of design matrix :
design <- model.matrix(~samples + batch)
fit <- lmFit(norm.data, design)
fit2 <- eBayes(fit, trend=TRUE)
to have a first look across all possible comparisons vs the control group ----
Then i could construct based on your idea a design matrix without an intercept :
design2 <- model.matrix(~0 +samples + batch)
fit <- lmFit(norm.data, design2)
and specify explicitly the targeted contrasts of interest about the possible synergistic effects:
i.e.
fit2 <- contrasts.fit(fit, contrasts=con)
ebayes <- eBayes(fit2, trend=TRUE)
(* i have also found from the limma vignette the classifyTestsF(), but i havent used it and im not sure that it is useful in my case)
You don't need to use a separate design matrix to interrogate synergistic effects. Just stick with
design
- otherwise, you might have problems down the track if you mix up your matrices.I got it. Use the initial design matrix. So, one last but most important question:
as i use the design matrix with the intercept, the specified contrast (i.e.)
interogate the difference between the dual treatment (C10 + OHT) versus one substance(C10) for identify DE between the two "kinds" of therapy, but in respect to the baseline level, that is the control group right ?
Yes, your contrast will identify DE genes between the two therapies, while your "baseline" (i.e., intercept) in your
design
is the DMSO control group.Dear Aaron,
i followed your instructions above and im posting the code because in the makeContrasts argument and contrasts.fit i get some weird warnings, which i dont know if there are important and i have made some critical mistakes. My code is as below:
samples <- factor(selected_data$Sample.Group, levels=c("T47D_DMSO", "T47D_OHT", "T47D_GDC", "T47D_C10", "T47D_GDC_OHT", "T47D_C10_OHT"))
pData(selected_data)$replicate <- factor(rep(c(1:2), each=1, times=6))
# indicate the different biological replicates which belond to two *different batches*
batch <- factor(selected_data$replicate)
design <- model.matrix(~samples + batch)
My design output is like the above i have posted
fit <- lmFit(selected_data, design)
and then:
con <- makeContrasts(samplesT47D_C10_OHT-samplesT47D_C10, samplesT47D_C10_OHT-samplesT47D_OHT, levels=design)
and i get the first warning:
Warning message:
In makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10, samplesT47D_C10_OHT - :
Renaming (Intercept) to Intercept
afterwards,
fit2 <- contrasts.fit(fit, contrasts=con)
Warning message:
In contrasts.fit(fit, contrasts = con) :
row names of contrasts don't match col names of coefficients
ebayes <- eBayes(fit2, trend=TRUE)
(* just also to mention that i have already with the same design matrix without the use of contrasts fit used the code below to test for DE for each biological group against DMSO, as we have already discussed above:
fit <- lmFit(selected_data, design)
fit2 <- eBayes(fit, trend=TRUE) )
Both warnings stem from the fact that makeContrasts will try to replace the coefficient name
"(Intercept)"
with"Intercept"
. To get rid of the warnings, just manually rename the intercept column of the design matrix to"Intercept"
. Assuming the first column is the intercept (which it should be, for your parametrization of the design):Use this renamed matrix for every step in the analysis pipeline, including
lmFit
.Dear Aaron,
thank you for your responce !! i changed my design matrix with your above function regarding the renaming of the Intercept column(first in my design matrix and then none of the above warnings appeared again ! One other thought i had before you answered was to use a design matrix without an intercept and write all the contrasts you have posted:
design2 <- model.matrix(~0 +samples+batch)
fit <- lmFit(filtered, design2) and then something like this(after i have changed the name of the columns of the design matrix)
cont_mat <- makeContrasts(OHT-DMSO, GDC-DMSO, C10-DMSO, GDC_OHT-DMSO, C10_OHT-DMSO, (C10_OHT-DMSO) - (C10-DMSO) - (OHT-DMSO)# i.e for testing for epistatic/synergistic effects.......-it is a bit laborious but it tests all possible coefficients of interest at one time
Dear Aaron,
i returned with some feedback and some more general question regarding the possible options of functional enrichment of some specific results. More specifically, as i have firstly followed your recommendations, for instance to search for addictive effects for the double therapy C10_OHT:
samples <- factor(targets2$Sample.Group, levels=c("T47D_DMSO", "T47D_OHT", "T47D_GDC", "T47D_C10", "T47D_GDC_OHT", "T47D_C10_OHT"))
batch <- factor(targets2$replicate)
design <- model.matrix(~samples + batch)
and then :
con <- makeContrasts("samplesT47D_C10_OHT-samplesT47D_C10", "samplesT47D_C10_OHT-samplesT47D_OHT", levels=design)
fit2 <- contrasts.fit(fit, contrasts=con)
ebayes <- eBayes(fit2, trend=TRUE)
and then based on each of the two coefficients, i created two DEG lists, which i intersected to find common genes which are found DE in the double therapy vs either of the monotherapies as you have suggested. But, as for this interprentation of this specific dual treatment my final intersection list consists of 44 common genes, because this is a small number to use for input for any of the common enrichment methods, is there some web-tool or package for such small number of genes ? i have heard http://genemania.org/ but i have never used in the past
If you want to test for upregulation of pathways or such, try using
kegga
(for KEGG pathways) orgoana
(for GO terms) inlimma
. Alternatively, you could increase the FDR threshold within each DE list to get more genes. Remember, the intersection operation is quite conservative, so the true error rate is probably a lot lower than the nominal threshold. As such, you won't get damaged by too many false positives if you lift the threshold.Dear Aaron,
please excuse me again for posting one more question, but after your explanations and suggestions both on this post and also the other post (C: Limma - discrepancy in number of significant genes ) i noticed one very crusial step which i might performed wrong or the opposite, and im not sure after the latest updates and my misunderstanding of makeContrasts. More basically, as you have mentioned above, if i want to search for instance for non-synergistic effects in one double therapy i used
con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10 - samplesT47D_OHT, levels=design)---
But should i have used
con <- makeContrasts((samplesT47D_C10_OHT -DMSO) -(samplesT47D_C10-DMSO) - (samplesT47D_OHT-DMSO), levels=design) which results in a different contrast from above ?
Or as far as i have computed the design matrix
samples <- factor(targets2$Sample.Group, levels=c("T47D_DMSO", "T47D_OHT", "T47D_GDC", "T47D_C10", "T47D_GDC_OHT", "T47D_C10_OHT"))
batch <- factor(targets2$replicate)
design <- model.matrix(~samples + batch)
The first code is correct(as you have suggested) ?
Please excuse me for insisting on this, but it is very crusial in order not to implement any wrong part in my analysis and not understand it completely !!
Best Regards
Your first contrast looks for non-additive effects, which can be interpreted as synergy. For this purpose, it is correct, assuming that the design matrix is the same as that specified in your original post.
The second contrast makes no sense. Think about it - what is
samplesT47D_C10_OHT - DMSO
? ThesamplesT47D_C10_OHT
term represents the log-fold change of the C10 + OHT double treatment over the DMSO control. The DMSO term represents the average expression of the control. Why would you want to subtract the average expression from the log-fold change? The same argument applies to the other subtractions in that contrast.Yes i understand ! My design matrix is the same as in my post in the first line above--But i needed to be completely sure, because if i had not create design matrix as above, but as :
design <- model.matrix(~0 +samples+batch), then the makeContrasts argument must change right ? That was the specific part that i was confused also in the other post
That's right. If you construct a second design matrix with:
then your second contrast (using
levels=design2
) would be correct. Each subtraction represents the log-fold change over the control, so you could use them to test for non-additive changes for your double treatment.Finally a clear view.So it all depends on the design matrix. Thus, i havent made a mistake as i used the original design matrix and sticked to your code regarding looking for synergy (con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10 - samplesT47D_OHT, levels=design)). Thank you again for your patience and explanations !!!!
Dear Aaron,
please excuse me again for posting here, but as im writting a report about my results some naive questions resulted regarding the interprentation of the term "additive effects" and the resulted common genes(common ProbeIDs) from resulting the two DEG lists from your first proposal from above, that is
So my main question is the following:
Regarding that the 4 following lines represent a small sample of the output of the intersect of the two separate DEG lists based on the common DEG probeIDs(each list with adjusted p val < 0.05)
The term additive to pharmacology generally means "that the effect of two chemicals is equal to the sum of the effect of the two chemicals taken separately". But, here it has the same meaning ? In other words, for instance for the gene OLFM1 in my initial separate DEG lists vs DMSO (from topTable without makeContrasts) it is not found DE in C10 compound, only in OHT and the double treatment C10_OHT.
Or alternatively, additive here means that these genes found common after the intersect, which are both DE in the double vs either each of the two monotherapies, represent this type of interaction ?
Please excuse me again for my naive questions, but i got confussed in the explanation, because here it may represents a different meaning from a just biology perspective
Best,
Efstathios
My interpretation of an additive effect in gene expression is this: if an additive effect exists for the two treatments, the log-fold change of the double treatment over the DMSO control should be equal to the sum of the log-fold changes for each of the single treatments over the control. This is analogous to the pharmacology definition that you've described.
Now, the first contrast I described in my original answer (and the one that you're using above) does not specifically test for additive effects. Non-additive effects will also result in rejection of the null, e.g., if the double treatment results in an increase/decrease in expression beyond the sum of each treatment, so long that it's still DE between the double and single treatments. Additive effects also result in rejection, but this rejection is not specific to such effects.
The reason I was talking about an additive effect in my original answer was to distinguish this first approach (which will reject upon an additive effect) from the second approach (which will not). In fact, if you want to identify genes that exhibit additive behaviour across the two treatments, I would suggest using the second contrast and checking whether the log-fold change is close to zero - either by looking at the confidence intervals, or proceeding more formally with a TOST.
So this needs more understanding from my view, so i will again ask you some important things about your above answer to be correct about my report and also to understand fully the subject. So, firstly regarding the above contrast, it is more correct that the common intersected genes from the two contrasts-for instanse for the C10_OHT- to present them as "common DEG between the double therapy and either any of the monotherapies-" and not for possible additive effects ? as in your explanation, it is not sure that these genes will present an additive effect in gene expression(i.e the gene OLFM1 which is not found DE in C10 treatment), but maybe are possible candidates ?.
Moreover, what do you mean by " first approach (which will reject upon an additive effect) " ?
Also, with the intersection model about the second contrast you proposed, that is
i dont explicitly check/test for non-additive effects ? as here i assume that the logFC of the double treatment is equal to the sum of the both individual treatments ? So, these genes that are DE after this implementation, do not represent non-additive effect(possible synergy or epistasis) ?
Finally, in the last paragraph you suggest the second contrast-by which you mean the above i posted -? (
Please correct me for these many questions and please correct me if i have understand something wrong but is very important not only for my project as also to make accurate assumptions and understand in a formal way my results.
makeContrasts
describes an additive effect of treatment, i.e., that the log-fold change of the double treatment over DMSO is equal to the sum of the log-fold changes for both single treatments. Significant genes are those that reject this null hypothesis, so any DE genes that you find with this contrast will exhibit non-additive behaviour.