Hello Michael Love ,
I used DESeq2 to find DE genes between 2 groups in our dataset. Now, I'm trying to see if the number of DE genes is significant. So I repeated the following steps many times:
- permute the sample labels (i.e. column names)
- run DESeq2 on the permuted dataset
- Get the number of DEG
This generated a distribution of DEG. However, the true number of DEG is not significant compared to this distribution. Did I do the permutation test correctly, or is there another way to determine whether the number of DEG is significant? Is the permutation test even necessary?
For context, the dataset contains 6 groups (3 conditions in 2 tissues). I'm trying to compare 2 conditions in a single tissue.
Thank you for the help!
Hi Dr. Love, thank you for your response. I thought I needed to do the permutation test as a negative control. I repeated the permutation test 1000 times -- is it still underpowered? Can you please explain why there are still DEG when I permute the sample labels? Sorry, I'm not too familiar with statistics, and I really appreciate the help.
How many samples do you have. Do you have any batches to control for?
I have 70 samples in total. There are 2 different tissues, and 3 conditions per tissue. There are 10-13 samples in each condition of each tissue.
We did not control for any batches. We just ran DESeq2 on all the samples to get the DEG between 2 conditions in 1 tissue. However, we think that the different tissues might create a batch effect. Do you recommend controlling for the tissue type?
I would make a PCA plot to assess if you have batches. And yes, you would need to control for batch in any analysis.
The permutation test assumes sample exchangeability which is violated if there is any nuisance variation or known variation like tissue.
Got it, thank you. I made the PCA, and the tissue does create a batch effect.
When we previously ran DESeq2, we created a new variable called "group," which was condition + tissue. I.e. condition = A and tissue=1, then group = "A.1". When we ran DESeq2, we set the design as "~ 0 + group". Does this properly control for the tissue?
Yes. My point about permutation is that, if there are large differences in the dataset with say tissue, some of the permutations will have correlation of the dummy condition with the actual biological differences. Permuting within tissue would be better.