Question

treatDGE vs glmLRT

1

Entering edit mode

jet2020 ▴ 10

@jet2020-7460

Last seen 9.1 years ago

United States

Hi - I'm an R noob and an bioinformatics noob so forgive any ignorance in my post or responses.

Doing DGE analysis from RNAseq counts using EdgeR. I have the data set imported and have filtered, normalized, estimated dispersions and am now working on calling DE genes

I was toying with the idea of using treatDGE to keep from having to filter out low logFC data after using glmLRT but treatDGE is giving me huge logFC values that are very different compared to those from glmLRT using the same inputs (see below).

I'm fine just using glmLRT, but just for my own edification I would appreciate an explanation of the behavior of treatDGE here. Thanks. -Jen

> BvsA.lrt <- glmLRT(fit, contrast=BvsA)
> BvsA.de <- treatDGE(fit, contrast=BvsA, lfc=2)
> topTags(BvsA.lrt)
Coefficient:  -1*A 1*B 
              logFC     logCPM       LR       PValue         FDR
RGS16      4.692429  6.4079442 22.79720 1.800281e-06 0.007220489
ADAMTSL3   3.899486  2.0392587 22.59436 2.000715e-06 0.007220489
KIAA1199   5.443129  3.8755438 22.49064 2.111697e-06 0.007220489
PBX4     -12.244102  2.7096437 22.07098 2.627527e-06 0.007220489
FGD3     -12.731544  4.7401369 21.54637 3.453767e-06 0.007220489
SLC16A7    6.788948  3.4398053 21.42148 3.686179e-06 0.007220489
ITIH5      5.546266  1.2237345 20.96641 4.674069e-06 0.007220489
SELL      -4.213155 -0.1066297 20.80724 5.079059e-06 0.007220489
SLA      -11.422386  1.9546705 20.63175 5.566514e-06 0.007220489
TMEM191A -12.067484  2.1965851 20.60017 5.659115e-06 0.007220489
> topTags(BvsA.de)
Coefficient:  -1*A 1*B 
                logFC    logCPM       PValue        FDR
PBX4       -144269489 2.7096437 1.535377e-05 0.05387531
FGD3       -144269490 4.7401369 1.656099e-05 0.05387531
CD84       -144269492 7.2249668 2.300909e-05 0.05387531
TMEM191A   -144269489 2.1965851 2.895685e-05 0.05387531
PRAMEL     -144269493 6.2552073 3.180367e-05 0.05387531
SLA        -144269489 1.9546705 3.497900e-05 0.05387531
AC007163.4 -144269491 3.8948523 4.228308e-05 0.05387531
ALDH3B2    -144269492 6.3312907 4.252703e-05 0.05387531
IGLV6-57   -144269490 4.0316593 4.292817e-05 0.05387531
C6orf99    -144269489 0.6054585 4.791796e-05 0.05387531
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.8.6  limma_3.22.6

edger • 2.2k views

ADD COMMENT • link updated 9.1 years ago by James W. MacDonald 65k • written 9.1 years ago by jet2020 ▴ 10

score 1 · Answer 1 · 2015-03-13

I can tell you why, but you will have to wait until Gordon or Aaron get in to work for the rationale.

In treatDGE, there is a line that goes like this:

 if (glmfit$prior.count != 0) {
        coefficients.mle <- glmfit$unshrunk.coefficients
    }
    else {
        coefficients.mle <- glmfit$coefficients
    }

And you almost surely have a prior.count > 0, so the coefficients you are using for treatDGE are the unshrunk coefficients. In addition, if you look at the counts for say PBX4, most or all of the counts for your B group are very close to zero, so the (unshrunken) fold change is massive. Hence the huge fold changes.

As an example, if you run example(treatDGE), and then add in some extra counts where one group is large, and the other is zero, here is what you get:

> counts <- rbind(counts, c(420, 450, 370, 0,0,0))
> d <- DGEList(counts=counts,lib.size=rep(1e6, nlibs), group=group)
> tr <- treatDGE(gfit, coef=2, lfc=1)
> tr2 <- glmLRT(gfit, coef=2)
> topTags(tr)
Coefficient:  as.factor(group)2
            logFC   logCPM       PValue          FDR
101 -1.442695e+08 7.705051 1.048435e-37 1.058920e-35
1   -2.445750e+00 7.634200 1.302969e-04 6.579994e-03
5   -2.240345e+00 7.915874 7.426105e-04 2.357278e-02
3   -2.207999e+00 8.120664 9.335755e-04 2.357278e-02
2   -2.074439e+00 7.978228 2.801031e-03 5.362816e-02
4   -2.064800e+00 7.673598 3.185831e-03 5.362816e-02
20  -1.032605e+00 6.505144 4.671534e-01 9.996438e-01
92   8.681553e-01 6.779166 6.320188e-01 9.996438e-01
75  -8.353693e-01 6.594941 6.622075e-01 9.996438e-01
38   8.073549e-01 6.369228 6.863417e-01 9.996438e-01
> topTags(tr2)
Coefficient:  as.factor(group)2
          logFC   logCPM         LR       PValue          FDR
101 -11.6915981 7.705051 202.259971 6.709174e-46 6.776266e-44
1    -2.4433391 7.634200  36.197754 1.782759e-09 9.002933e-08
5    -2.2386470 7.915874  31.299969 2.210832e-08 7.443135e-07
3    -2.2065628 8.120664  30.698951 3.013279e-08 7.608529e-07
2    -2.0730075 7.978228  27.269348 1.769951e-07 3.575301e-06
4    -2.0630427 7.673598  26.731858 2.337360e-07 3.934555e-06
20   -1.0310269 6.505144   6.721532 9.525579e-03 1.374405e-01
92    0.8670885 6.779166   4.879078 2.718414e-02 3.431997e-01
75   -0.8342052 6.594941   4.469030 3.451457e-02 3.873302e-01
38    0.8060397 6.369228   4.108755 4.266176e-02 4.308838e-01

And you can see that the fold changes are comparable, except for the fake gene we added.