Question: Correct construction of design matrix in limma for multiple contrasts for gene expression profiling
1
4.5 years ago by
svlachavas740
Greece/Athens/National Hellenic Research Foundation
svlachavas740 wrote:

Dear Bioconductor Community,

i have recently preprocessed and analyzed an illumina microarray dataset in R both with lumi and limma package, but my main question stands for the appropriate implementation of statistical inference. In detail, my experiment tests the use of different PI3K inhibitors on metastatic T47D breast cancer cells. My pheno data object is the below:

targets2

Sample.Group                   replicate
1             T47D_DMSO                        1
2             T47D_DMSO                        2
3              T47D_OHT                          1
4              T47D_OHT                          2
5              T47D_GDC                          1
6              T47D_GDC                          2
7              T47D_C10                           1
8              T47D_C10                           2
9              T47D_GDC_OHT                 1
10            T47D_GDC_OHT                 2
11            T47D_C10_OHT                  1
12            T47D_C10_OHT                  2

where the basic control group is the DMSO, the other biological substances are various PI3K inhibitors, while the two last groups implement the possible synergistic action of the merging of dual inhibitors. Also, the coefficient "replicate" shows the different biological replicates(batch, the repeat of the experiment) but all the samples were processed together for the microarray experiment. Then i continued with the following code:

samples <- factor(targets2$Sample.Group, levels=c("T47D_DMSO", "T47D_OHT", "T47D_GDC", "T47D_C10", "T47D_GDC_OHT", "T47D_C10_OHT")) batch <- factor(targets2$replicate)

design <- model.matrix(~samples + batch)

design
(Intercept)        samplesT47D_OHT      samplesT47D_GDC       samplesT47D_C10   samplesT47D_GDC_OHT
1            1               0                                              0                               0                        0
2            1               0                                              0                               0                        0
3            1               1                                              0                               0                        0
4            1               1                                              0                               0                        0
5            1               0                                              1                               0                        0
6            1               0                                              1                               0                        0
7            1               0                                              0                               1                        0
8            1               0                                              0                               1                        0
9            1               0                                              0                               0                        1
10           1               0                                             0                               0                        1
11           1               0                                             0                               0                        0
12           1               0                                             0                               0                        0
samplesT47D_C10_OHT     batch2
1                    0                        0
2                    0                        1
3                    0                        0
4                    0                        1
5                    0                        0
6                    0                        1
7                    0                        0
8                    0                        1
9                    0                        0
10                   0                       1
11                   1                       0
12                   1                       1
attr(,"assign")
[1] 0 1 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$samples [1] "contr.treatment" attr(,"contrasts")$batch
[1] "contr.treatment"

So, my first naive but important question is that after the construction of my design matrix, my model will test each biological substance or combination vs the control group (after "correcting" for the batch coefficient possible effect), or it is essentially an anova experiment ? thus, it test each combination versus the others ?

And if is the second case, should i proceed after the design matrix with functions makeContrasts & contrasts.fit in order to specify differences of higher importance and interest, such as the comparison of the last biological group (tamoxifen + Compound10) versus i.e. each substance alone to find different and more specific gene alterations,as the main goal of the specific experiment is to study any possible combinatorial effects of tamoxifen & PI3Kα inhibitors ?

Any suggestion, idea or correction would be valuable !! Thank you in advance for your time !!!

modified 4.4 years ago • written 4.5 years ago by svlachavas740
Answer: Correct construction of design matrix in limma for multiple contrasts for gene e
3
4.5 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

For your first question; your contrast will depend on what coefficients you drop, or what contrast vector/matrix you specify. In this case, each of the coefficients named samples* will represent the log-fold change of your various treatments over the DMSO control. So, dropping each of them separately will give you a comparison between the corresponding treatment and control. Dropping all of them at once will test for differences between any of the treatments and the control.

For your second question; determining whether the dual treatment has any effect is not obvious, as it depends on what you want to find. For example, if treatment with C10 gives a log-FC of 1, and treatment with OHT gives a log-FC of 1, and treatment with both gives a log-FC of 2, is this additive effect interesting? Or do you want to find genes where the effect of double treatment is non-additive, e.g., to get synergistic or epistatic effects?

If it's the former, then you can just do comparisons between C10 + OHT and each of OHT and C10. For example, for C10:

con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10, levels=design)

You can then pass this through contrasts.fit to do the DE analysis. You can repeat this with OHT, and intersect the resulting DE lists to identify genes that are DE in the dual treatment compared to either of the single treatments.

If you want the latter, then you can use a contrast that mimics an interaction model:

con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10 - samplesT47D_OHT, levels=design)

This asks whether the log-FC from the dual treatment is equal to the sum of the log-FCs from the individual treatments. If not, there might be some interesting non-additive effects happening.

Dear Aaron,

your detailed answers are grateful and very useful. To summarize, in case i have misunderstood anything because the construction of the design matrix is very important:

1. Regarding my first question, you mention that (and correct me if it is something different) that based on the design matrix i have posted and due to the fact that my "baseline" level is the control group(DMSO), all the comparisons would be made between each biological group vs DMSO right ?

2. Secondly, regarding your very useful suggestions about the constrasts, which i will follow, we are mostly interested in analyzing the molecular basis of synergy between tamoxifen and GDC-0941 as also tamoxifen with a new potential PI3Ka inhibitor(compound 10). Thus, im wondering if it is more useful to go straight to the second implementation you proposed with contrasts fit instead of all the possible comparisons ? Or should i firstly use the decideTests() and then a Venn diagram with the simple first approach instead of the topTable (without the contrasts fit) to see possible intersections ? Thank you for your consideration on this matter !!

1. Yes, assuming that you drop one sample* coefficient at a time.
2. The second approach in my answer is more appropriate for looking at synergy, under the definition that synergistic effects exceed the sum of the effects of each drug.

Dear Aaron,

please excuse me but what exactly do you mean by "drop one sample coefficient at a time"  ? that is, the specific implementation of my design matrix which includes the intercept as the first reference group which is the control group, and the other biological groups ?

Moreover, because i would like to ask you if i dont include the intercept term, and proceed as below:

design <- model.matrix(~0 +samples + batch) :

the methodology and the results would be the same as far as i specify with makeContrasts my comparisons of interest ?

Or i need from the start to define without an intercept my design matrix, and then define my various comparisons ??

Thank you again for your patience and guidance !!

1

For your design with an intercept, you can drop one coefficient at a time by specifying coef= in topTable. There's nothing wrong with your implementation of the design matrix, but you still need to specify what comparisons you want to make, and one way to do that is with the coef argument.

If you use a design matrix without the intercept, you'll have to specify your comparisons with makeContrasts, rather than just dropping each coefficient. This is because each coefficient will represent the average expression for each of your treatment groups (DMSO, C10, OHT, etc.) so you'll need to explicitly specify a comparison between groups.

Dear Aaron,

thank you again for your explanation about the topTable and the coef argument which specifies the specific comparison i want to extract each time !! Thus, i believe i could proceed firstly with my initial post of design matrix :

design <- model.matrix(~samples + batch)

fit <- lmFit(norm.data, design)
fit2 <- eBayes(fit, trend=TRUE)

to have a first look across all possible comparisons vs thcontrol group ----

Then i could construct based on your idea a design matrix without an intercept :

design2 <- model.matrix(~0 +samples + batch)

fit <- lmFit(norm.data, design2)

and specify explicitly the targeted contrasts of  interest about the possible synergistic effects:

i.e.

con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10, levels=design)

fit2 <- contrasts.fit(fit, contrasts=con)

ebayes <- eBayes(fit2, trend=TRUE)

(* i have also found from the limma vignette the classifyTestsF(), but i havent used it and im not sure that it is useful in my case)

You don't need to use a separate design matrix to interrogate synergistic effects. Just stick with design - otherwise, you might have problems down the track if you mix up your matrices.

I got it. Use the initial design matrix. So, one last but most important question:

as i use the design matrix with the intercept, the specified contrast (i.e.)

con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10, levels=design)

interogate the difference between the dual treatment (C10 + OHT) versus one substance(C10) for identify DE between the two "kinds" of therapy, but in respect to the baseline level, that is the control group right ?

1

Yes, your contrast will identify DE genes between the two therapies, while your "baseline" (i.e., intercept) in your design is the DMSO control group.

Dear Aaron,

i followed your instructions above and im posting the code because in the makeContrasts argument and contrasts.fit i get some weird warnings, which i dont know if there are important and i have made some critical mistakes. My code is as below:

samples <- factor(selected_data$Sample.Group, levels=c("T47D_DMSO", "T47D_OHT", "T47D_GDC", "T47D_C10", "T47D_GDC_OHT", "T47D_C10_OHT")) pData(selected_data)$replicate <- factor(rep(c(1:2), each=1, times=6))

#  indicate the different biological replicates which belond to two *different batches*

batch <- factor(selected_data$replicate) design <- model.matrix(~samples + batch) My design output is like the above i have posted fit <- lmFit(selected_data, design) and then: con <- makeContrasts(samplesT47D_C10_OHT-samplesT47D_C10, samplesT47D_C10_OHT-samplesT47D_OHT, levels=design) and i get the first warning: Warning message: In makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10, samplesT47D_C10_OHT - : Renaming (Intercept) to Intercept afterwards, fit2 <- contrasts.fit(fit, contrasts=con) Warning message: In contrasts.fit(fit, contrasts = con) : row names of contrasts don't match col names of coefficients ebayes <- eBayes(fit2, trend=TRUE) (* just also to mention that i have already with the same design matrix without the use of contrasts fit used the code below to test for DE for each biological group against DMSO, as we have already discussed above: fit <- lmFit(selected_data, design) fit2 <- eBayes(fit, trend=TRUE) ) ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by svlachavas740 1 Both warnings stem from the fact that makeContrasts will try to replace the coefficient name "(Intercept)" with "Intercept". To get rid of the warnings, just manually rename the intercept column of the design matrix to "Intercept". Assuming the first column is the intercept (which it should be, for your parametrization of the design): colnames(design)[1] <- "Intercept" Use this renamed matrix for every step in the analysis pipeline, including lmFit. ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Aaron Lun25k Dear Aaron, thank you for your responce !! i changed my design matrix with your above function regarding the renaming of the Intercept column(first in my design matrix and then none of the above warnings appeared again ! One other thought i had before you answered was to use a design matrix without an intercept and write all the contrasts you have posted: design2 <- model.matrix(~0 +samples+batch) fit <- lmFit(filtered, design2) and then something like this(after i have changed the name of the columns of the design matrix) cont_mat <- makeContrasts(OHT-DMSO, GDC-DMSO, C10-DMSO, GDC_OHT-DMSO, C10_OHT-DMSO, (C10_OHT-DMSO) - (C10-DMSO) - (OHT-DMSO)# i.e for testing for epistatic/synergistic effects.......-it is a bit laborious but it tests all possible coefficients of interest at one time ADD REPLYlink written 4.5 years ago by svlachavas740 Dear Aaron, i returned with some feedback and some more general question regarding the possible options of functional enrichment of some specific results. More specifically, as i have firstly followed your recommendations, for instance to search for addictive effects for the double therapy C10_OHT: samples <- factor(targets2$Sample.Group, levels=c("T47D_DMSO", "T47D_OHT", "T47D_GDC", "T47D_C10", "T47D_GDC_OHT", "T47D_C10_OHT"))

batch <- factor(targets2$replicate) design <- model.matrix(~samples + batch) colnames(design)[1] <- "Intercept" and then : con <- makeContrasts("samplesT47D_C10_OHT-samplesT47D_C10", "samplesT47D_C10_OHT-samplesT47D_OHT", levels=design) fit2 <- contrasts.fit(fit, contrasts=con) ebayes <- eBayes(fit2, trend=TRUE) and then based on each of the two coefficients, i created two DEG lists, which i intersected to find common genes which are found DE in the double therapy vs either of the monotherapies as you have suggested. But, as for this interprentation of this specific dual treatment my final intersection list consists of 44 common genes, because this is a small number to use for input for any of the common enrichment methods, is there some web-tool or package for such small number of genes ? i have heard http://genemania.org/ but i have never used in the past ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by svlachavas740 If you want to test for upregulation of pathways or such, try using kegga (for KEGG pathways) or goana (for GO terms) in limma. Alternatively, you could increase the FDR threshold within each DE list to get more genes. Remember, the intersection operation is quite conservative, so the true error rate is probably a lot lower than the nominal threshold. As such, you won't get damaged by too many false positives if you lift the threshold. ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Aaron Lun25k Dear Aaron, please excuse me again for posting one more question, but after your explanations and suggestions both on this post and also the other post (C: Limma - discrepancy in number of significant genes ) i noticed one very crusial step which i might performed wrong or the opposite, and im not sure after the latest updates and my misunderstanding of makeContrasts. More basically, as you have mentioned above, if i want to search for instance for non-synergistic effects in one double therapy i used con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10 - samplesT47D_OHT, levels=design)--- But should i have used con <- makeContrasts((samplesT47D_C10_OHT -DMSO) -(samplesT47D_C10-DMSO) - (samplesT47D_OHT-DMSO), levels=design) which results in a different contrast from above ? Or as far as i have computed the design matrix samples <- factor(targets2$Sample.Group, levels=c("T47D_DMSO", "T47D_OHT", "T47D_GDC", "T47D_C10", "T47D_GDC_OHT", "T47D_C10_OHT"))

batch <- factor(targets2$replicate) design <- model.matrix(~samples + batch) The first code is correct(as you have suggested) ? Please excuse me for insisting on this, but it is very crusial in order not to implement any wrong part in my analysis and not understand it completely !! Best Regards ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by svlachavas740 1 Your first contrast looks for non-additive effects, which can be interpreted as synergy. For this purpose, it is correct, assuming that the design matrix is the same as that specified in your original post. The second contrast makes no sense. Think about it - what is samplesT47D_C10_OHT - DMSO? The samplesT47D_C10_OHT term represents the log-fold change of the C10 + OHT double treatment over the DMSO control. The DMSO term represents the average expression of the control. Why would you want to subtract the average expression from the log-fold change? The same argument applies to the other subtractions in that contrast. ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Aaron Lun25k Yes i understand ! My design matrix is the same as in my post in the first line above--But i needed to be completely sure, because if i had not create design matrix as above, but as : design <- model.matrix(~0 +samples+batch), then the makeContrasts argument must change right ? That was the specific part that i was confused also in the other post ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by svlachavas740 1 That's right. If you construct a second design matrix with: design2 <- model.matrix(~0 +samples+batch) then your second contrast (using levels=design2) would be correct. Each subtraction represents the log-fold change over the control, so you could use them to test for non-additive changes for your double treatment. ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Aaron Lun25k Finally a clear view.So it all depends on the design matrix. Thus, i havent made a mistake as i used the original design matrix and sticked to your code regarding looking for synergy (con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10 - samplesT47D_OHT, levels=design)). Thank you again for your patience and explanations !!!! ADD REPLYlink written 4.4 years ago by svlachavas740 Dear Aaron, please excuse me again for posting here, but as im writting a report about my results some naive questions resulted regarding the interprentation of the term "additive effects" and the resulted common genes(common ProbeIDs) from resulting the two DEG lists from your first proposal from above, that is con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10, sampleT47D_C10_OHT - samplesT47D_OHT, levels=design)... So my main question is the following: Regarding that the 4 following lines represent a small sample of the output of the intersect of the two separate DEG lists based on the common DEG probeIDs(each list with adjusted p val < 0.05)  PROBEID GENE_SYMBOL C10_OHTvsC10_logFC C10_OHTvsOHT_logFC GENE_NAME ILMN_1742025 OLFM1 -2.53 0.69 olfactomedin 1 ILMN_1680618 MYC -1.79 -0.63 v-myc avian myelocytomatosis viral oncogene homolog ILMN_1685703 ACOX2 -1.37 -0.55 acyl-CoA oxidase 2, branched chain The term additive to pharmacology generally means "that the effect of two chemicals is equal to the sum of the effect of the two chemicals taken separately". But, here it has the same meaning ? In other words, for instance for the gene OLFM1 in my initial separate DEG lists vs DMSO (from topTable without makeContrasts) it is not found DE in C10 compound, only in OHT and the double treatment C10_OHT. Or alternatively, additive here means that these genes found common after the intersect, which are both DE in the double vs either each of the two monotherapies, represent this type of interaction ? Please excuse me again for my naive questions, but i got confussed in the explanation, because here it may represents a different meaning from a just biology perspective Best, Efstathios ADD REPLYlink written 4.4 years ago by svlachavas740 1 My interpretation of an additive effect in gene expression is this: if an additive effect exists for the two treatments, the log-fold change of the double treatment over the DMSO control should be equal to the sum of the log-fold changes for each of the single treatments over the control. This is analogous to the pharmacology definition that you've described. Now, the first contrast I described in my original answer (and the one that you're using above) does not specifically test for additive effects. Non-additive effects will also result in rejection of the null, e.g., if the double treatment results in an increase/decrease in expression beyond the sum of each treatment, so long that it's still DE between the double and single treatments. Additive effects also result in rejection, but this rejection is not specific to such effects. The reason I was talking about an additive effect in my original answer was to distinguish this first approach (which will reject upon an additive effect) from the second approach (which will not). In fact, if you want to identify genes that exhibit additive behaviour across the two treatments, I would suggest using the second contrast and checking whether the log-fold change is close to zero - either by looking at the confidence intervals, or proceeding more formally with a TOST. ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Aaron Lun25k So this needs more understanding from my view, so i will again ask you some important things about your above answer to be correct about my report and also to understand fully the subject. So, firstly regarding the above contrast, it is more correct that the common intersected genes from the two contrasts-for instanse for the C10_OHT- to present them as "common DEG between the double therapy and either any of the monotherapies-" and not for possible additive effects ? as in your explanation, it is not sure that these genes will present an additive effect in gene expression(i.e the gene OLFM1 which is not found DE in C10 treatment), but maybe are possible candidates ?. Moreover, what do you mean by " first approach (which will reject upon an additive effect) " ? Also, with the intersection model about the second contrast you proposed, that is con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10 - samplesT47D_OHT, levels=design) i dont explicitly check/test for non-additive effects ? as here i assume that the logFC of the double treatment is equal to the sum of the both individual treatments ? So, these genes that are DE after this implementation, do not represent non-additive effect(possible synergy or epistasis) ? Finally, in the last paragraph you suggest the second contrast-by which you mean the above i posted -? ( con <- makeContrasts(samplesT47D_C10_OHT - samplesT47D_C10 - samplesT47D_OHT, levels=design) ? Please correct me for these many questions and please correct me if i have understand something wrong but is very important not only for my project as also to make accurate assumptions and understand in a formal way my results. ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by svlachavas740 1 • Your interpretation is correct; the genes in the intersection represent those that are DE between the double treatment and both of the single treatments. Such genes may or may not exhibit an additive effect for the double treatment, but you could consider them as possible candidates for additive behaviour. • The first approach will (generally) reject the null hypothesis if there is a non-trivial additive effect. For example, assume that you have a gene with log-fold change of 1 in OHT vs DMSO; the same log-fold change of 1 in C10 vs DMSO; and a log-fold change of 2 in OHT + C10 vs DMSO (thus an additive effect, as this is the sum of the log-fold changes of the two individual treatments). The null hypothesis in the first approach describes equality in the log-fold changes between double and either of the single treatments. This is not true in our example (2 != 1), so the null should be rejected for this gene, pending power considerations. Of course, the converse is not true, i.e., rejection doesn't indicate that there is an additive effect. • For the second contrast, the null hypothesis that we've set up in makeContrasts describes an additive effect of treatment, i.e., that the log-fold change of the double treatment over DMSO is equal to the sum of the log-fold changes for both single treatments. Significant genes are those that reject this null hypothesis, so any DE genes that you find with this contrast will exhibit non-additive behaviour. • Yes. ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Aaron Lun25k Answer: Correct construction of design matrix in limma for multiple contrasts for gene e 0 4.4 years ago by svlachavas740 Greece/Athens/National Hellenic Research Foundation svlachavas740 wrote: Dear Aaron, thank you again for your excellent explanations !!! To sum things up and to consider my final questions regarding some more implementations i should consider including in my analysis: 1. Firstly, regarding the example of the "posssible candidates for additive behaviour": if i check in the intersected list of common genes(probeIDs) and i find for instanse one probeID that is both DE in each separate condition, and also the fold change of the double treatment is equal to the sum of the two monotherapies-vs the DMSO group from the initial comparison-(i.e. C10_OHT logFC=2, C10logFC=0.5, OHTlogFC=1.5), could i safely considered it as a possible gene for addictive effect ? Alternatively, could i check the logFCs for each difference as the above 3 genes i posted, and see if these differences are equal to the double treatment alone 2. Or on the other hand, if this is not formal and has possible bias, how could i implement the "TOST" you mention above ? it is a package or function(because i didnt find anything)? And i should use it on the second contrast with the interaction model ? If it is possible to give some direction on how to implement it it would be grateful !! Thank you for your patience & your consideration on this matter !!!! Best, Efstathios ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by svlachavas740 2 If you want to find genes with additive behaviour, the simplest approach is to perform a DE comparison with the second contrast. Run topTable with confint=0.95, and identify all non-significant genes where the upper and lower bounds of the confidence interval are both near-zero, e.g., CI.L > -1 and CI.R < 1. This will identify genes where the interaction effect of the double treatment is close to zero, i.e., additive behaviour upon double treatment. I prefer this method to implementing TOST, which is more formal but is technically annoying to set up and is likely to be impracticably conservative. I would then intersect the above set of genes with the set of genes that are DE in both of the individual treatments versus the DMSO control (as you've suggested in point 1, above). This gives us genes that have non-zero log-fold changes upon individual treatment, where those log-fold changes are also additive upon double treatment. If we didn't do this intersection, we'd end up with a lot of genes that are trivially additive because the log-fold changes are zero for one or both of the individual treatments. ADD REPLYlink written 4.4 years ago by Aaron Lun25k Answer: Correct construction of design matrix in limma for multiple contrasts for gene e 0 4.4 years ago by svlachavas740 Greece/Athens/National Hellenic Research Foundation svlachavas740 wrote: Dear Aaron, firstly -about implementing your procedure for identifying possible gene candidated which exchibit additive effect- im posting some part of the code just to get a extra confirmation that its ok and there is no crusial mistake that can hamper the downstream analysis: So, for the second contrast, i start with the first lmFit from the initial design matrix, as in my original post question: design <- model.matrix(~samples + batch) colnames(design)[1] <- "Intercept" fit <- lmFit(filtered.2, design) con <- makeContrasts("samplesT47D_C10_OHT-samplesT47D_C10-samplesT47D_OHT",levels=design) # for the double C10_OHT treatment fit2 <- contrasts.fit(fit, contrasts=con) ebayes <- eBayes(fit2, trend=TRUE) top2 <- topTable(ebayes, coef=1, number=nrow(ebayes), adjust.method="fdr", sort.by="none",confit=0.95) # as you proposed Then, after i removed any unannotated probesets and duplicate probeIDs(im not posting the code to not make the answer very large), my final topTable looks like this: head(top3) GENE_SYMBOL logFC CI.L CI.R adj.P.Val MAD ILMN_1689111 CXCL12 0.05264171 -0.6149950 0.7202784 0.9733067 2.741335 ILMN_1717393 PTCHD1 1.69945096 0.4320406 2.9668614 0.3704552 2.579279 ILMN_2219767 MYCN 1.63698831 0.1567033 3.1172733 0.4754438 2.547747 ILMN_3226613 CLEC2D 0.81932260 -1.0647680 2.7034132 0.8167712 2.056058 ILMN_1710544 PCDH7 0.79172199 -0.6914990 2.2749430 0.7679140 2.027434 ILMN_1691647 CGB5 0.88980980 -0.3506129 2.1302325 0.6775603 1.996046 dim(top3) [1] 9681 6 and then i used : selected <- top3[which(top3$CI.L >-1 & top3$CI.R<1),] # as you mentioned to satisfy both upper and lower bounds dim(selected) [1] 7418 6 So, if everything(i hope) it is correct, regarding the second part of your answer about the intersection procedure: now i have to intersect the above probeIDs from selected with the two separate DEG lists from the initial topTable, that is in my original question, which DEG lists contain the DEG genes for each individual treament vs DMSO, that is in my case OHT vs DMSO & C10 vs DMSO right ? And the possible common genes from both the 3 lists (simultaneously) will represent these candidates AND not generally common DEG genes from my previous answer that could or could not represent genes which exhibit additive behaviour ?? Finally, one silly question but important regarding the general knowledge of statistics: As selected data.frame contains these probeIDs with the wanted confidence intervals, it is correct that sum(selected$adj.P.Val < 0.05)
[1] 0
as these are non-significant ?

Or im mistaken and this has a different meaning ?

Thank you again for your valuable help !!

1

Yes, I would intersect selected with the set of genes that are DE in both of the comparisons of the individual treatment versus control. The resulting set will contain those genes that exhibit additive differential expression upon double treatment. And yes, the set of genes in selected should be non-significant, as we're explicitly looking for those genes where the interaction term (i.e., the non-additive effect) is close to zero. In fact, you could enforce this by re-defining your selected set as:

selected <- top3[which(top3$CI.L >-1 & top3$CI.R<1 & top3$adj.P.Val > 0.05),] ADD REPLYlink written 4.4 years ago by Aaron Lun25k Dear Aaron, i changed the selected with the addition of adj.P.Val as above, then i noticed that selected <- top3[which(top3$CI.L >-1 & top3$CI.R<1 & top3$adj.P.Val > 0.05),]
dim(selected)
[1] 7879    6

After when i created a Venn diagram using the probeIDS from the 3 lists, that is the two initial DEG[OHTvs DMSO & C10vsDMSO] and the selected probeIDs, but unfortunately i noticed, that when i intersected first the two individual DEG lists of the two individual therapies, they have in common 35 common probeIDs. But afterwards, when i intersected the selected list with the 35 common, i didnt find any common probeIDs !!

Or alternatively can i try intersecting the probeIDs of the selected data frame with the "possible candidates for additive effect" from the 1rst contrast ? or it is irrelevant ?

**Just an update: i used the same procedure as above but now with the GDC_OHT double therapy, and at the end i found 16 probeIDs common from the intersection of the 3 lists

1
Well, that just means that you don't have a lot of genes that are DE in both treatments versus control (35 is a small number relative to ~8000 for the selected set). You could try relaxing the FDR threshold with which you define DE against control. This would give you more genes for intersection with the selected set. I don't think that intersecting with the results from the first contrast would be helpful. Genes that exhibit an additive effect and are DE between the double and both single treatments should also have DE between both single treatments and the control. There shouldn't be any systematic difference between the two approaches.

Possibly as you mention except the fact  "you don't have a lot of genes that are DE in both treatments versus control", maybe in the case of the one double treatment C10_OHT there arent any significant additive effects. So, maybe it is more appropriate to just present the common DEGs from the double therapy vs either of the double treatments.

On the other hand, regarding the second double therapy GDC_OHT, these common 16 probesets from the intersection, should i check their individual logFCs and check also if there are DE in the double treatment GDC_OHT vs the DMSO group ?

1
Whether that's appropriate or not depends on what you're trying to find. Just be aware that the genes fron the first contrast are not guaranteed to exhibit additive behaviour. As for the GDC example; additiveness may not always result in DE in the double treatment relative to control. For example, if one treatment gives a log-fold change of 1 relative to control, and the other treatment gives a log-fold change of -1, then the double treatment would have a log-fold change of 0 if the effects were additive.

I understand-as there are not many, i could check them to see their individual behaviour in the single treatments and the double treatment. Thus far, for instanse for the first i found that: OHTlogFC=-1.231, GDClogFC=-1.7289,

and GDC_OHT=-3,073. So, their (absolut)difference is close to zero. Thus, to summarize in your opinion should i check them one by one or these 16 i present them as genes exhibit additive effects ?

Please excuse me again for any disturbance !!

1
Might as well check a few, just to make sure that it looks reasonably additive.

just a final update-i checked each one of the 16 probesets that have resulted from the intersection above, regarding their respective logFCs in each individual therapy and in the double therapy. While the total difference from the doublelogFC of both individual logFCs resulted in lFCs near zero[such as -0.04, 0.000298, 0.01,] and some ~0.1/ also some resulted in totalLFcs like -0.28, -0.4565, 0.33 & -0.56581. So, my point is that can i still consider these last 4 probesets with these values that exhibit additive effect, or i can just mention these values next to the total interprentation ? I know im getting disturbing, but i just want to confirm any results from characterized wrongly.

Best,

Efstathios

1

Keep in mind that the log-FC from the second contrast in my original answer directly represents the disparity between the double treatment log-FC and the sum of the single treatment log-FCs. You have already defined selected based on the fact that the disparity has a 95% confidence interval within (-1, 1). Thus, even if the disparity isn't exactly zero, you can be reasonably assured that it doesn't lie outside of the confidence interval, and that an additive interpretation is valid for the relevant genes. A more statistically rigorous approach would use TOSTs, which are fairly conservative and will probably result in an empty intersection.

So from the beggining with the second contrast(double treatment vs both treatments) and along with the confidence intervals, we have a reasonable methodology to assume that these 16 genes reliably represent these candidates which exhibit additive behaviour ?

Also, the mentioned disparity i noticed in some of the probeIDs, could be attributed in various reasons ? i.e. measurement error, fluorecent hybridization intensity problems ?

To sum up, as you propose i should stick with the already implemented prosedure you proposed and dont also use TOST, regarding the possible disadvantages it might present with no possible results ?