I have 35 tumor and 4 normal samples. I'm using DESeq2 for differential analysis. Differential analysis between tumor and normal gave only two upregulated genes which could be due to statistical power. So, I'm interested in selection of random samples from tumor condition and do differential analysis with that and repeat the process `n` times.
I have a matrix with genes as rows and samples as columns. Columns 1-35 are tumor and 36-39 are normal samples.
nb.replicates <- 10 samples.Normal <- sample(36:39, replace=FALSE) set.seed(123) ## Random sampling of the Tumor samples.Tumor <- sample(c(1:35), size=nb.replicates, replace=FALSE) samples.Tumor selected.samples <- c(samples.Normal, samples.Tumor)
So, with the above code I repeated differential analysis `n` number of times. I have different number of differential expressed genes with each analysis.
Now, from all the analysis should I merge and consider only common genes as differentially expressed genes for the whole cohort?
In each analysis it is 4 normal vs and 10 random tumor samples.
baseMean log2FoldChange lfcSE stat pvalue padj AL357060.1 8.50582 6.1871 1.67335 3.54023 0.0003 0.03245
In another analysis it is 4 normal vs another 10 random tumor I see the same gene differentially expressed but with different values as results
baseMean log2FoldChange lfcSE stat pvalue padj AL357060.1 10.58937424 6.552371044 1.6296950 3.85921 0.00011 0.02642
There are many genes with different results in different analysis so, which one should I consider?
Thanks for the reply. Basically with full analysis (35 tumor vs 4 normal samples) I got only 4 Upregulated genes using
results
function [results(dds, lfcThreshold = log2(1.2), alpha = 0.05)]. I felt random-subsampling can be applied to get more Upregulated genes from different analysis and then merge them. You said this random subsampling procedure is not a good idea. One more reason to apply subsampling is because of tumor samples grouped into different clusters. MDS plot https://imgur.com/a/YbB3wPVQuestions:
1) May I know when this subsampling can be applied?
2) You said me to increase FDR cutoff with full dataset analysis. So, what should be the FDR cutoff now? 0.01 0r 0.5 or 0.1?
I do not recommend subsampling
You should pick an FDR cutoff that makes sense. That is up to you as the analyst.
Thanks. And could you please tell when subsampling can be applied for differential analysis?
I have limited time to reply to users’ questions on the support site and I have to divide it among all the threads that are active. I believe I’ve already answered your question so I won’t be replying further.