Question

Not many DE genes using glmTreat

0

Entering edit mode

Barista • 0

@e51ccd8d

Last seen 2.0 years ago

Netherlands

Hi all,

I am doing a DE analysis of transcriptomics data (Quantseq) where I compare 2 groups consisting of 300 patients in each group. I have normalized my libraries for library size and composition, have filtered out lowly expressed genes (with a count of <10) and did a glmQLFTest.

Using the decideTests function I get the following output:

summary(decideTests(glf))
-1*x   1*y
    Down                414
    NotSig            13509
    Up                  638

If I then use the glmTreat function where I use a FC cut-off of 1.5 (or logFC cut-off of 0.58), I only retain 19 genes!

tr <- glmTreat(fit, contrast = xvsy, lfc = log2(1.5))

is.de <- decideTestsDGE(tr)
summary(is.de)

-1*x   1*y
Down                  9
NotSig            14542
Up                   10

So my question is: how can it be that there are only this ''many" DE genes left? Of course a conclusion can be that the groups are more similar than previous thought, but I think that is highly unlikely. Are there other factors that might have influenced this result? Can I for example take an other lfc cut-off? And can I also take other (clinical) factors into account with the glmQLFTest (for example, age, comorbidities etc.)

Furthermore, in the result are some genes with a logFC of >0.58 and <-0.58, but with a FDR of >0.05, so I assume this is the reason these are not taken into account in the decideTests table for DE genes. In the below table you can see the downregulated genes and there are 28 downregulated genes with a logFC of <-0.58.

> test <- topTags(tr, n = Inf)
> sort(test$table$logFC)
   [1] -2.6109802 -1.7722134 -1.4860788 -1.4045712 -1.2123096 -1.0688111 -0.9362862 -0.9275250 -0.8460224 -0.7968878 -0.7695722
  [12] -0.7607765 -0.7589310 -0.7300097 -0.7214861 -0.6902481 -0.6802954 -0.6769655 -0.6632714 -0.6614588 -0.6494449 -0.6491878
  [23] -0.6425783 -0.6393694 -0.6289857 -0.6231364 -0.6192043 -0.6020387 -0.5798458 -0.5759566 -0.5705645 -0.5683261 -0.5673250
  [34] -0.5652899 -0.5617309 -0.5606385 -0.5543450 -0.5532832 -0.5505854 -0.5444628 -0.5315794 -0.5298383 -0.5251693 -0.5226470
  [45] -0.5220916 -0.5196427 -0.5173993 -0.5146078 -0.5070766 -0.5040420 -0.4991255 -0.4955685 -0.4951795 -0.4932777 -0.4902114

Please, help! :)

EdgeR Bioinformatics Transcriptomics Quantseq • 1.4k views

ADD COMMENT • link 2.9 years ago Barista • 0

score 1 · Answer 1 · 2021-08-20

1

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 39 minutes ago

WEHI, Melbourne, Australia

The whole purpose of glmTreat() is to reduce the number of DE genes, so I do not follow why you are surprised by that. glmTreat() is only intended for situations when there are too many DE genes to interpret and you want to reduce to the most important DE genes to allow easier interpretation and pathway analysis. If you don't want to reduce the number of DE genes and restrict to genes with large fold changes, then don't use glmTreat. The default edgeR pipeline is already fine. Or else simply use a smaller fold-change threshold.

What documentation are you reading? The edgeR documentation and workflows try to make it clear that glmTreat is optional and that the lfc threshold can be freely chosen at whatever level is useful for your study.

ADD COMMENT • link 2.9 years ago Gordon Smyth 51k

0

Entering edit mode

Dear Gordon,

I am using the EdgeR manual as reference, but as a starter I sometimes worry that I am missing or skipping an essential step.

I could use glmTreat to get just a list of genes which are highly up/downregulated and in my case I can manually annotate them (those 19 genes). With the results of the glmQLFTest I will try to do a network/pathway analysis.

Anyway, many thanks for your answer, much appreciated!

ADD REPLY • link 2.9 years ago Barista • 0