DE Analysis Coefficient
1
0
Entering edit mode
d.huntley • 0
@dhuntley-24188
Last seen 3.6 years ago

We have a problem with our DE analysis where we are intersecting results and manually removing genes. This is an overview of the samples and the analysis plan.

We have a total of 8 conditions (cell lines): • control • samples with transfected with Gene 1 • samples with transfected with Gene 2 • samples co-transfected with, both, Gene 1 and Gene 2

There are two versions of the above 4 conditions at different temperatures, one set at 32C and one at 37C. When the 2 genes are expressed in parallel at 32C they activate a downstream signalling, which is not activated when each gene is expressed individually or when the 2 genes are expressed in parallel at 37C. It is the gene expression signature of the downstream signalling that we want to identify.

The original analysis plan to identify the gene expression signature promoted by Gene 1 and Gene 2 combination involves comparing every transfected sample to the control at their respective temperature, then keeping only the genes of the co-transfectant that are not found in the individual Gene 1 or Gene 2 results, at both temperatures, and finally from the gene signature of the co-transfectant at 32C keep only the genes that are not also found in the signature of the 37C equivalent.

The issue with the above method is that intersecting results and manually removing genes that are found in other lists is not statistically sound as it erases the false discovery rate control performed by DESeq2, and false positives/negatives will have an effect on what genes remain or are filtered out in the final gene list.

The main question is whether anyone can recommend a more statistically correct way of doing this analysis given the research aim, through the inclusion of multiple factors or interaction terms in the DESeq2 design formula. Right now the design matrix consists of a single column with the 8 separate conditions explained previously, and only this factor is included in the design formula.

deseq2 • 622 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 21 hours ago
United States

Testing an interaction term, depending on how you set up the design, would get at e.g. whether the effect of transfection of gene 1 and gene 2 is non-additive on the log counts. By adding an interaction with temperature, you could test whether the interaction of gene 1 and gene 2 is different across temperatures. It sounds like this would achieve your final set -- to some degree.

However, if you additionally require that features are not DE with gene 1 or gene 2 alone, you cannot obtain this with an interaction design, but you have to do intersection of different FDR sets. You can use altHypothesis="lessAbs" to test for genes being not DE, after specifying lfcThreshold as an LFC of no practical significance.

ADD COMMENT
0
Entering edit mode

Thank you very much for your response. To clarify, how could the inclusion of multiple interaction terms be implemented in DESeq2 in this case? For example, would it be possible, assuming a design matrix consisting of 2 factors (gene 1, gene 2) with yes/no values for each sample depending on transfection status, to then use a design formula of gene1 + gene2 + gene1:2? Or is there a simpler way of doing this, more similar to the example in the DESeq2 vignette?

Regarding the temperature interaction, would it actually help in providing the desired result? The aim is to detect changes caused by the co-transfection of genes 1 and 2 relative to the control while accounting for changes caused by the individual transfections. This should happen for the sample sets at both temperatures and then, finally, the goal is to isolate the co-transfection changes at 32C degrees while accounting for the co-transfection changes at 37C degrees. Would an interaction term for temperature allow for this level of calibration in the results, or would it be better to use an interaction term just for the gene transfections, apply it to both temperature sets and then manually intersect the 2 FDR lists of the co-transfection DE genes from the different temperatures?

Thanks.

ADD REPLY
0
Entering edit mode

Two interaction terms could look like ~gene1 + gene2 + temp + gene1:gene2 + temp:gene1 + temp:gene2 + temp:gene1:gene2. In this model, temperature can modify the gene1 and gene2 effects, as well as their interaction. Or if you think that temperature only modifies the interaction of gene1 and gene2 you could leave out the temp:gene1 + temp:gene2 terms. It all depends on what you want to specify as the full model and what you want to specify as the null. If you specify the full model and then test on the final interaction term temp:gene1:gene2, this will provide the genes where temperature is modifying the interaction of the two genes (that is, across temperature, the non-additive effect of gene1 and gene2 is different). There are many scenarios in which this could happen though, so you may need to combine sets of FDR bounded lists in order to produce a single set of genes that match your criteria.

I have to admit, I don't have too much time to consult on statistical analyses of groups using DESeq2, but I limit myself on the support site to help with software usage questions. DESeq2 relies on the same design formula as base R, so then you could consult with anyone familiar with linear models and design formula in R to help work out exactly how to extract your results of interest for your experiment.

ADD REPLY
0
Entering edit mode

Thanks for the feedback, very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6