FPKM t-test vs DEseq2
1
0
Entering edit mode
SH ▴ 10
@55ad8686
Last seen 10 months ago
South Korea

Hi there! I'm interested in DEG analysis in RNA-seq and I have a question about statistical analysis methods. I tried to analyze DEGs using three different methods as follows:

  1. DEseq2 package
  2. FPKM t-test in excel =T.TEST(ctr1:ctr3,trt1:trt3,2,2)
  3. log2(FPKM) t-test in excel =T.TEST(ctr1:ctr3,trt1:trt3,2,2)

Actually, I'm a bit confused between 2 and 3. I feel like I should do 3 to assume that the data follows a normal distribution. Anyway, what I'm most curious about is whether the results of 1 and 2 or 3 are completely consistent, even if they don't match perfectly.

For example, in my analysis, the results were as follows:

  1. up&down=1500
  2. up=200, down=650

When I tried different datasets, there were cases where the number of up-regulated genes was higher than the number of down-regulated genes in the t-test, while the number of up and down-regulated genes were similar in the DEseq2 results.

My beloved professor is excited to torment me even more with this result. My professor has two main arguments:

  1. Generally, t-tests are less strict in terms of test power than parametric tests, so the number of DEGs in t-tests is expected to be higher than in DEseq2. So, why are there generally more genes identified by DEseq2?
  2. The result of the t-test performed on FPKM data should be similar to DEseq2 results, but the difference is too great.

Now, I need to prepare evidence to refute my professor's arguments or else I need to think more about my analysis. Any opinions are welcome.

thanks,

DESeq2 statistics rna-seq • 2.2k views
ADD COMMENT
1
Entering edit mode
ATpoint ★ 4.5k
@atpoint-13662
Last seen 1 day ago
Germany

For 1) t-tests are parametric, based on the normal distribution. t-tests are expected to yield far fewer DEGs simply because at low sample size they're massively underpowered, please read papers to learn why. That is the basis why methods such as DESeq2 even exist. RNA-seq is not normally distributed, that is well-known.

For 2) No, there is no basis for such a statement. Again, this has extensively been discussed in the biostats/RNA-seq literature over the last two decades. Apparently your advisor has done little research on that with all due respect, as it is well-accepted today that analysis especially in the presence of low sample size needs specialized methods to moderate (low) counts.

The question on how standard tests (be it t, wilcox, others) compare to specialized methods has been asked many times before, both here and on platforms such as StackExchange and biostars.org, please google for it and refer to benchmarking papers. Same goes for pro/con of normalized counts such as FPKM for testing.

My recommendation would be to not overthink on alternative strategies and simply use what everyone uses for RNA-seq. That could be DESeq2 with raw counts, or alternatives such as edgeR or limma-voom.

ADD COMMENT

Login before adding your answer.

Traffic: 342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6