Question

edgeR or wilcoxon rank test? Which is right?

0

Entering edit mode

AZ ▴ 30

@fereshteh-15803

Last seen 19 months ago

United Kingdom

Hi,

I have histopathologic response to neoadjuvant chemoradiation in 56 cancer samples. A total of 26 samples were classified as minor and 30 as major histopathologic responders (TRG1-2 and TRG4-5 respectively). I have done edgeR and wilcoxon test to find genes driving the difference of tumor samples of patients with major or minor response as below.

group= as.factor(c(rep ("TRG1-2",26), rep("TRG4-5", 30)))


> group
[1] TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2
[17] TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG1-2 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5
[33] TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5
[49] TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5 TRG4-5
Levels: TRG1-2 TRG4-5
> dim(df)
[1] 2560   56
> y <- DGEList(counts = df, group = condition) 
> y <- estimateDisp(y) 
Design matrix not provided. Switch to the classic mode.
> sqrt(y$common.dispersion)
[1] 0.6280918
> EdgeR <- exactTest(y) 
> topTags(EdgeR)
Comparison of groups:  TRG4-5-TRG1-2 
           logFC   logCPM       PValue          FDR
PPBP  -4.3340878 9.503884 3.564802e-11 9.125894e-08
CDK6  -1.5518198 8.712466 1.458599e-07 1.867006e-04
IL1B   1.7324695 9.178351 2.623373e-05 1.908504e-02
CXCL8  1.6455933 8.340310 3.129262e-05 1.908504e-02
EGR1   0.8468036 8.652308 4.432857e-05 1.908504e-02
IFIT2  0.8957873 7.535228 5.199642e-05 1.908504e-02
IL6    1.3926323 6.951407 5.218565e-05 1.908504e-02
BDNF   1.4176689 6.605966 7.471018e-05 2.134076e-02
PTGS2  1.4746062 8.352272 7.547266e-05 2.134076e-02
FOS    0.9891503 9.263358 8.336234e-05 2.134076e-02

And wilcoxon test as below

> library(GSALightning)

df1= cpm (df,log=TRUE)
> results <- wilcoxTest(df1,group, tests = "unpaired"))
There were 48 warnings (use warnings() to see them)
> head(results[,1:4])
       p-value:up-regulated in TRG1-2 p-value:up-regulated in TRG4-5
ACTB                       0.02007199                      0.9799280
ATP5F1                     0.51624724                      0.4837528
DDX5                       0.87211880                      0.1278812
EEF1G                      0.76612743                      0.2338726
GAPDH                      0.12111916                      0.8788808
NCL                        0.44491768                      0.5550823
       q-value:up-regulated in TRG1-2 q-value:up-regulated in TRG4-5
ACTB                        0.9998235                      0.9822301
ATP5F1                      0.9998235                      0.6930090
DDX5                        0.9998235                      0.4650225
EEF1G                       0.9998235                      0.5331378
GAPDH                       0.9998235                      0.9138647
NCL                         0.9998235                      0.7347522
>

The list of significant genes either up-regulated in TRG1-2 or TRG4-5 are 100% different with edgeR results. Please help me to know which results are wright and which is wrong

Thank you for any suggestion

edgeR r wilcox RNA-seq cancer • 2.1k views

ADD COMMENT • link 5.8 years ago AZ ▴ 30

score 2 · Answer 1 · 2019-02-01

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

It is quite common for different DE tests to rank genes differently and, if there are only a few DE genes, then the top genes can easily be non-overlapping. This would be true even if neither of the DE tests are not "wrong", but I'm not a big fan of the Wilcoxon test for this sort of data.

If this is sequencing data of some sort, then the Wilcoxon test would be wrong if applied to counts because it doesn't account for differences in sequencing depth between samples.

Even if you convert to CPMs, the observations would still not be identically distributed under the null hypothesis, which the Wilcoxon Test assumes.

Another issue is that is not correct to apply FDR correction to up and down p-values separately, which the wilcoxTest function seems to be doing.

I wonder what the warning messages are that wilcoxTest has generated.

ADD COMMENT • link 5.8 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks a lot, this is edgeseq a sort of RNAseq that does not need RNA extraction. However I fed cpm normalized data after log by cpm function in edgeR into wilcoxon test and same group for edgeR. Is wilcoxon not wrong yet even with normalized read counts?

I saw people use mann withney for such data for I am not sure what to do

Thank you for any help

I used t test on normalized data but error saying no difference detected

ADD REPLY • link 5.8 years ago AZ ▴ 30

0

Entering edit mode

This is very confusing. I don't recall seeing the cpm function in your original question.

ADD REPLY • link 5.8 years ago Gordon Smyth 52k

0

Entering edit mode

Sorry, I just edited my post. I have used cpm log values for any t-test or non-parametric test

ADD REPLY • link 5.8 years ago AZ ▴ 30