Hi - I'm an R noob and an bioinformatics noob so forgive any ignorance in my post or responses.
Doing DGE analysis from RNAseq counts using EdgeR. I have the data set imported and have filtered, normalized, estimated dispersions and am now working on calling DE genes
I was toying with the idea of using treatDGE to keep from having to filter out low logFC data after using glmLRT but treatDGE is giving me huge logFC values that are very different compared to those from glmLRT using the same inputs (see below).
I'm fine just using glmLRT, but just for my own edification I would appreciate an explanation of the behavior of treatDGE here. Thanks. -Jen
> BvsA.lrt <- glmLRT(fit, contrast=BvsA)
> BvsA.de <- treatDGE(fit, contrast=BvsA, lfc=2)
> topTags(BvsA.lrt)
Coefficient: -1*A 1*B
logFC logCPM LR PValue FDR
RGS16 4.692429 6.4079442 22.79720 1.800281e-06 0.007220489
ADAMTSL3 3.899486 2.0392587 22.59436 2.000715e-06 0.007220489
KIAA1199 5.443129 3.8755438 22.49064 2.111697e-06 0.007220489
PBX4 -12.244102 2.7096437 22.07098 2.627527e-06 0.007220489
FGD3 -12.731544 4.7401369 21.54637 3.453767e-06 0.007220489
SLC16A7 6.788948 3.4398053 21.42148 3.686179e-06 0.007220489
ITIH5 5.546266 1.2237345 20.96641 4.674069e-06 0.007220489
SELL -4.213155 -0.1066297 20.80724 5.079059e-06 0.007220489
SLA -11.422386 1.9546705 20.63175 5.566514e-06 0.007220489
TMEM191A -12.067484 2.1965851 20.60017 5.659115e-06 0.007220489
> topTags(BvsA.de)
Coefficient: -1*A 1*B
logFC logCPM PValue FDR
PBX4 -144269489 2.7096437 1.535377e-05 0.05387531
FGD3 -144269490 4.7401369 1.656099e-05 0.05387531
CD84 -144269492 7.2249668 2.300909e-05 0.05387531
TMEM191A -144269489 2.1965851 2.895685e-05 0.05387531
PRAMEL -144269493 6.2552073 3.180367e-05 0.05387531
SLA -144269489 1.9546705 3.497900e-05 0.05387531
AC007163.4 -144269491 3.8948523 4.228308e-05 0.05387531
ALDH3B2 -144269492 6.3312907 4.252703e-05 0.05387531
IGLV6-57 -144269490 4.0316593 4.292817e-05 0.05387531
C6orf99 -144269489 0.6054585 4.791796e-05 0.05387531
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.8.6 limma_3.22.6

Also please note that in the case of a gene with large counts in one condition, and very few or zero counts in another, the fold change isn't that useful. In other words, for the fake gene I introduced in the example, a log fold change of 11.7 (or ~3300 fold change on the linear scale) isn't materially different from the infinite fold change that treatDGE() returns. At some point, fold changes get so excessive that all you can say is 'WOW, that is really different', and leave it at that.
The use of unshrunk log-fold changes is intentional. This is because
treatDGEuses the size of the log-fold change, to determine whether it is below the specifiedlfcthreshold. This will affect the calculation of the p-value. In contrast,glmLRTdoes not use the log-fold change for any calculations, so it does a bit of shrinkage to make them look prettier.P.S.
treatDGEwill be renamed toglmTreatin the next release.Thanks for that. I figured I was essentially dividing by zero somewhere with treatDGE.
The more I read and explore the more it seems (and please correct me if my supposition is wrong here) for RNAseq data the fold changes are moderately soft anyway and its generally better to establish DE genes more as a true/false based on logical (for the biological system) p-value and FDR cutoffs rather than being married to a particular fold change threshold.