Hello, guys. I am working with scRNA-Seq and I would appreciate your opinion and expertise on my analysis rationale and design.
I've processed two independent single-cell GSEs, I've merged both datasets using the mutual-nearest neighbors method (MNNcorrect-Scran) and identified the interest cell-type present in both datasets, adding a cell-type label for such cells. Now, I want to use this label to recover my interest cell-type in the raw data and perform DE between two feeding conditions using Deseq2. Since the raw data comes from two independent GSEs, I cannot use the cells from the two datasets at the same time to compare directly the effect of the feeding conditions in this cell-type.
I think that the best way to go would be two merge the GSEs and subset for my cell-type of interest, creating a single Deseq object. Then, create an "interaction" factor merging the origin Study (GSE1, GSE2) X cell Condition (Fed, FD) variables. I would use the design=~interaction.
This way I think I could get the values for FedvsFD inside both studies. (GSE1.Fed_vsGSE1.FD,GSE2.FedvsGSE2.FD). Do you think I could simply get the average of the results of the previous comparisons? or is there a better way I could tell Deseq2 to give me the average of such comparisons in a single output? or do you think it would be better two perform two independent analysis, one for each GSE?
I am trying to reproduce the methodology used in the DE of MNN integrated single cell datasets, but using Deseq2 with the considerations for dropseq data. In the paper, they have essentially merged the raw data for two different GSE, created a variable considering the study (GSE) and cell type identified after the MNN correction, created multiple contrasts comparing only cells inside the same GSEs each time and then got the average fold change.
Important: According to the previous paper of MNN correction, the cells would be comparable between the different datasets since the clustering step that identifies the cell-types is performed for all batches at once and that ensures that the cells in one batch have a counterpart in the other one.
I would like to do something similar, but using Deseq2. Do you think it is feasible? what do you guys think of this type of approach?
Their approach in Limma that I am trying to adapt for Deseq2
meta.marker$interact.group <- as.factor(apply(meta.marker, 1, FUN=function(X) paste(X["Study"], X["new.labels"], sep=".")))
correct.design <- model.matrix(~ 0 + interact.group, data=meta.marker)
correct.fit <- limma::lmFit(correct.exprs[, rownames(meta.marker)], correct.design)
contrast.mat <- limma::makeContrasts(GSE81076.Delta-GSE81076.Gamma, GSE85241.Delta-GSE85241.Gamma, EMTAB5061.Delta-EMTAB5061.Gamma, GSE86473.Delta-GSE86473.Gamma, levels=correct.design)
Get the average log fold changes across batch by cell type interactions correct.fit2 <- limma::contrasts.fit(correct.fit, contrasts=rowMeans(contrast.mat))
Thanks guys! All the best, Gabriel