Differential expression analysis: Permutation Test
2
1
Entering edit mode
Radek ▴ 90
@radek-8889
Last seen 5.3 years ago
Belgium

Hello!

I have a RNA-seq experiment that I analyzed using DESeq2, limma, edgeR with the following design:

  • 11 treatments realized on 2 different cell lines
  • 1 controls from each cell line.

The classical design matrix is the following:

Samples   Treat1   Treat2   ...      Treat11   Control   CellLine
1   1   0          0   0   1
2   1   0          0   0   0
...                        
23   0   0          0   1   1
24   0   0          0   1   0

So to summarize I have two biological replicates for each conditions (11) with the cell line information as blocking factor. 

Due to small issues in the wet-lab experiment, I would like to try to exclude the possibility that the results I obtain are due to things unrelated to the treatment that has been applied to the cells. A colleague suggested to use permutation test on the design matrix to test if my targets where just background noise or true differentially expressed genes. 

From my readings of some articles/posts I said that is was most probably not possible due to statistical issues with the assumptions made while doing the permutation test. Unfortunately I don't have a strong background in statistic that allows me to easily explain the reason. 

It is interesting to note here that my treatments have little effect on the transcriptome of my cells since only few genes are differentially expressed:

  Treat1 Treat2 Treat3 Treat4 Treat5 Treat6 Treat7 Treat8 Treat9 Treat10
DESeq2 196 7 40 0 33 0 9 0 1 18
Limma 83 0 5 3 9 1 8 1 9 9
edgeR 229 7 32 16 17 2 12 3 10 10
 

My questions are the followings:

1) Can you do a permutation tests at one step of the differential expression analysis (edgeR/limma/DESeq2) that could rule out effects unrelated to the treatment?

2) If answer1=no, could you explain me why we can't in the simpler way possible? Which rules are we breaking when trying to permutate the columns in the design matrix?

 

Thanks in advance,

 

Radek

 

PS: If you have relevant papers I should read to better understand the answer to my question I would be happy to read them.

deseq2 limma edger permutation • 5.1k views
ADD COMMENT
4
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 6 hours ago
The city by the bay

Permutations are most easily applied when you have two groups and nothing else, such that you can shuffle observations between groups under the null hypothesis that they're all from the same distribution (for simplicity, I'll just assume that all library sizes are equal, so we're dealing with the same NB distribution for all counts of each gene). In your case, it's complicated because you're blocking on the cell line. This would suggest that you can really only shuffle samples within the cell line (i.e., permute odd samples separately from even samples), otherwise you'd end up testing the null hypothesis that the two cell lines are the same. This is unlikely to be interesting or relevant, unless I've misunderstood the purpose of your experiment.

More problematic is the fact that you have multiple groups - if you shuffle samples across all groups, then your null hypothesis would be that counts from all groups come from the same distribution. This would not be useful if you're trying to identify DE genes between two particular treatments or to the control. In fact, if you wanted to test for DE between groups, you would be restricted to permuting only between those groups, e.g., if you wanted to test for differences between treatment 1 and control, you could swap samples 1 and 23 or samples 2 and 24 (keeping in mind the blocking on the cell line). That gives you a grand total of 4 permutations, 2 of which are redundant, and a minimum p-value of 0.5.

More generally, the controls should protect you from spurious DE unrelated to the treatment. Any experimental factors that could affect gene expression in the treatment samples should also apply to the controls, and cancel out in the DE analysis. If that's not the case, you should make some better controls. I don't see how permutation testing would offer any more protection.

ADD COMMENT
0
Entering edit mode

Thanks, that was quite clear!

Treatment and controls have been done using transfection. Unfortunately the cell culture has been pushed a little bit too far due to cell lines hard to culture and I can assume from my exploration of the data that I am dealing with few clones in each treatment.

Ex: 

Treatment A transfected with construction A ==> 10 different clones

Treatment B transfected with contruction B ==> 6 different clones

Control transfected with empty contruction ==> 2 different clones

So my fear is that since the effect of my construction is expected to be weak I could have been analyzing the effect of the clonal integration of my vector since my controls are made of few clones that are different from the one I see in the treatments.

Have you already encountered such type of issue (and hopefully do you know a way to deal with it?)

ADD REPLY
1
Entering edit mode

The obvious solution would be to make more clones. Any average effect of vector integration should then cancel out. Sure, it might be difficult if the cell line's hard to culture, but that's why the wet lab people are first authors on these papers.

ADD REPLY
0
Entering edit mode

Ok Since I am the one that got back this ongoing experiment that would mean I should do between 24 and 72 extra cell lines...

Anyway thanks for your answers, as always they are very clear and helpful! 

 

ADD REPLY
3
Entering edit mode
@gordon-smyth
Last seen 25 minutes ago
WEHI, Melbourne, Australia

See the section called "Parametric modelling versus permutation methods" in this article:

  http://nar.oxfordjournals.org/content/43/7/e47

In brief:

1. Permuting tests the wrong null hypothesis.
2. Permuting is technically incorrect for RNA-seq because the samples are not exchangeable (they have different library sizes)
3. Permuting is computationally slow.
4. Permuting cannot possibly give any significant DE genes in a small genomic experiment because permutation is incapable of giving p-values small enough to be significant after multiple testing adjustments.

ADD COMMENT

Login before adding your answer.

Traffic: 1052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6