Hello!
I have a RNA-seq experiment that I analyzed using DESeq2, limma, edgeR with the following design:
- 11 treatments realized on 2 different cell lines
- 1 controls from each cell line.
The classical design matrix is the following:
Samples | Treat1 | Treat2 | ... | Treat11 | Control | CellLine | ||||||
1 | 1 | 0 | 0 | 0 | 1 | |||||||
2 | 1 | 0 | 0 | 0 | 0 | |||||||
... | ||||||||||||
23 | 0 | 0 | 0 | 1 | 1 | |||||||
24 | 0 | 0 | 0 | 1 | 0 |
So to summarize I have two biological replicates for each conditions (11) with the cell line information as blocking factor.
Due to small issues in the wet-lab experiment, I would like to try to exclude the possibility that the results I obtain are due to things unrelated to the treatment that has been applied to the cells. A colleague suggested to use permutation test on the design matrix to test if my targets where just background noise or true differentially expressed genes.
From my readings of some articles/posts I said that is was most probably not possible due to statistical issues with the assumptions made while doing the permutation test. Unfortunately I don't have a strong background in statistic that allows me to easily explain the reason.
It is interesting to note here that my treatments have little effect on the transcriptome of my cells since only few genes are differentially expressed:
Treat1 | Treat2 | Treat3 | Treat4 | Treat5 | Treat6 | Treat7 | Treat8 | Treat9 | Treat10 | |
DESeq2 | 196 | 7 | 40 | 0 | 33 | 0 | 9 | 0 | 1 | 18 |
Limma | 83 | 0 | 5 | 3 | 9 | 1 | 8 | 1 | 9 | 9 |
edgeR | 229 | 7 | 32 | 16 | 17 | 2 | 12 | 3 | 10 | 10 |
My questions are the followings:
1) Can you do a permutation tests at one step of the differential expression analysis (edgeR/limma/DESeq2) that could rule out effects unrelated to the treatment?
2) If answer1=no, could you explain me why we can't in the simpler way possible? Which rules are we breaking when trying to permutate the columns in the design matrix?
Thanks in advance,
Radek
PS: If you have relevant papers I should read to better understand the answer to my question I would be happy to read them.
Thanks, that was quite clear!
Treatment and controls have been done using transfection. Unfortunately the cell culture has been pushed a little bit too far due to cell lines hard to culture and I can assume from my exploration of the data that I am dealing with few clones in each treatment.
Ex:
Treatment A transfected with construction A ==> 10 different clones
Treatment B transfected with contruction B ==> 6 different clones
Control transfected with empty contruction ==> 2 different clones
So my fear is that since the effect of my construction is expected to be weak I could have been analyzing the effect of the clonal integration of my vector since my controls are made of few clones that are different from the one I see in the treatments.
Have you already encountered such type of issue (and hopefully do you know a way to deal with it?)
The obvious solution would be to make more clones. Any average effect of vector integration should then cancel out. Sure, it might be difficult if the cell line's hard to culture, but that's why the wet lab people are first authors on these papers.
Ok Since I am the one that got back this ongoing experiment that would mean I should do between 24 and 72 extra cell lines...
Anyway thanks for your answers, as always they are very clear and helpful!