I am currently analyzing a RNA-Seq experiment where two cells lines were transfected with 10 different inserts that should decrease the level of 10 different targets that are present for sure in those cells. My goal is to detect the effect of the decrease of a given target at the transcriptomic level. The experiment has the following informations:
name cell vector CellA_1 A vector1 CellB_1 B vector1 ... ... ... CellA_10 A vector10 CellB_10 B vector10 CellA_empty A empty CellB_empty B empty
An empty vector is used as control.
I ran limma-voom and DESeq2 to detect the differentially expressed genes. The following is a summary of the analysis:
vector <- factor(colData$vector) cell <- factor(colData$cell) design_Subtypes <- model.matrix(~0+ vector + cell )
design <- cell + vector
contrast.limma <- makeContrasts(vector1-empty, vector2-empty,...) contrast.DESeq <- c("vector", "vector1", "empty")
I also generated a PCA/MDS plots (here is the PCA plot using the rld counts from DESeq2 and an intgroup="vector"). The empty vector is represented by the first color in the legend. PCA2=1% and PCA1=97%. The samples on the right of the graph are the one from cell type B and the one on the left the one from cell type A.
At the end, I obtained differentially expressed genes for the contrasts Vector1-Empty, .... Vector10-Empty.
When pulled together, I obtained 36 targets with limma and 172 targets with DESeq2. Among them 8 are shared between the two softwares.
I had a look to the DESeq2 normalized counts of those targets and found that among them I had cases like those one:
So GeneA and GeneB are differentially expressed even if one of the two cell line is not expressing the Gene. This biological situation is expected but I was wondering the impact of such difference on the statist of limma/DESeq. To be precise, ~30% of the genes found by limma have this issue and 0.5% with DESeq2.
Question1: Should I remove those genes and consider them as false positives or the statistics behind limma/DESeq can deal this biological situation? If the statistics can handle that, could you try to explain me how or point me to the thing I should read again the DESeq2/limma vignette?
Question2: I was looking at the DESeq2 normalized count so far but is it an limma-voom equivalent that I could use detect weird "normalized" counts like here?
Thanks in advance for your answers!