Hi,
I have carried out differential expression analyses comparing conditions using DESeq2. Intuitively, I have considered genes to be expressed if they have a count of at least 10 in at least some libraries (sensu Chen et al: https://f1000research.com/articles/5-1438). Hence, I carried out a filtering step before DE analysis using the filterByExpr function of edgeR. In my results, in addition to the pvalues and LFC etc. I have columns with baseMeans for conditions:
Gene sampleA sampleB baseMeanA_cond1_vs_cond2 baseMeanB_cond1_vs_cond2
Gene1 cond1 cond2 0 70.0618858219621
Gene2 cond1 cond2 0 13.8155035471724
(apologies if the tab-delimited table shows up poorly)
To get these, I did (e.g.):
baseMeanA_cond1_vs_cond2 <- rowMeans(counts(dds, normalized=TRUE)[,colData(dds)$Tissue == "cond1"])
baseMeanB_cond1_vs_cond2 <- rowMeans(counts(dds, normalized=TRUE)[,colData(dds)$Tissue == "cond2"])
Now, I am looking to further refine my results to find any genes that are expressed in one condition, and not expressed at all in another. In this case, I do not want to know that Gene1 is upregulated in Condition2 relative to Condition2, but is still expressed in Condition1. I would just like to know that Gene1 is expressed in Condition2, and is not expressed in Condition1.
What would be the best way to do this?
From reading this site and the DESeq2 vignette, I know that the baseMean is "the mean of normalized counts of all samples, normalizing for sequencing depth." However, I'm a bit confused about 1) how my criterion on counts having to be >=10 to be expressed has been factored into the final baseMean results, and 2) how to subset my DE results to get expressed vs not expressed.
Is it as simple as getting all genes where the baseMean for condition1 = 0, and the baseMean for condition2 > 0? Or would it be genes where the baseMean for condition1 < 10, and the baseMean for condition2 >= 10?
Also, if it's easier to do this separately to the DESeq2 results, I'm happy to do so, e.g. by subsetting a matrix of count values or TPM values or TMM values.
Thanks!
Charles