Search
Question: Differentially expressed genes selection with FC and FDR cutoff
1
gravatar for Beginner
3 months ago by
Beginner50
Beginner50 wrote:

Hi,

I'm using edgeR for differential analysis between tutor and normal tissue cases.

I would like to select differential expressed genes based on log foldchange >= 2 and FDR < 0.05

Based on edgeR f1000 Research paper I used tr <- glmTreat(fit, contrast=B.LvsP, lfc=log2(2)) which gives differential expressed genes with FC above 2 at FDR < 0.05

is.de <- decideTestsDGE(tr)
summary(is.de)
       -1*Normal 1*TNBC
Down               1018
NotSig             7182
Up                 1535

Among those 1535 Upregulated genes I also see genes with fold change 1 and also 1.5 which are with FDR < 0.05. Don't know why it gives genes with FC 1 and 1.5 also in the results.

I need differential expressed genes with FC >= 2 & FDR < 0.05 cutoff. May I know how to get this.

Thank you

ADD COMMENTlink modified 3 months ago by Gordon Smyth35k • written 3 months ago by Beginner50
2
gravatar for James W. MacDonald
3 months ago by
United States
James W. MacDonald47k wrote:

Do note that a log fold change of 1, when using log base 2 is in fact a two-fold difference.

ADD COMMENTlink written 3 months ago by James W. MacDonald47k

Yes, I know that. I found the way to get differential expressed genes with logFC |2| and fDR < 0.05

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(2))​

tab2 <- topTags(tr,n=Inf, adjust.method = "BH")

keep <- tab2$table$FDR <= 0.05 & abs(tab2$table$logFC) >= 2

 

ADD REPLYlink written 3 months ago by Beginner50
1

If you want the log fold change to be > |2|, then don't take logs for the lfc argument! Just use lfc = 2.

ADD REPLYlink written 3 months ago by James W. MacDonald47k
1

I don't think you fully understand James' point. If you're only interested in genes with a log-fold change (significantly) greater than 2, why are you using glmTreat with lfc=1? You should use lfc=2 instead. Also see the Note at the bottom of the documentation in ?topTable regarding post-hoc filtering on the log-fold change.

ADD REPLYlink written 3 months ago by Aaron Lun21k

oh yes, I understand now. log2(2) is nothing but 1. So, instead I will be using like this. 

tr <- glmTreat(fit, contrast=contrast.matrix)​

tab2 <- topTags(tr,n=Inf, adjust.method = "BH")

keep <- tab2$table$FDR <= 0.05 & abs(tab2$table$logFC) >= 2

Am I right here?

ADD REPLYlink written 3 months ago by Beginner50

No. You should be using lfc=2 in glmTreat. DE genes should only be selected on the basis of their (adjusted) p-values. Again, I suggest you read the Note that I referred to above.

ADD REPLYlink written 3 months ago by Aaron Lun21k

Sorry I was wrong may be. I saw one of your comment just now Obtaining Differentially expressed gene lists at a Fold change and FDR cut-off in which you answered in the way I did. Possibly it is because of treatDGE() function may be. So, I misunderstood from that.

So then to get differential expressed genes with | logFC | > 2 and FDR < 0.05 I will use following steps.

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=2)​
tab2 <- topTags(tr,n=Inf, adjust.method = "BH")
keep <- tab2$table$FDR <= 0.05
write.csv(tab2[keep,], "DEG.csv", row.names = TRUE)

Is this right now?

ADD REPLYlink written 3 months ago by Beginner50
1

Yes, that is correct.

ADD REPLYlink written 3 months ago by Aaron Lun21k

Thanks a lot Aaron !!

ADD REPLYlink written 3 months ago by Beginner50

Hi Aaron,

Small doubt. In many Research Papers I have seen people mentioning about fold change >=2 and fold change >=5 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4927085/ [ Check Results section ]. My doubt is what should be the minimum fold change cutoff for finding differentially expressed genes? Because in edgeR paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934518/pdf/f1000research-5-9996.pdf [Page 14] they have used lfc = log2(1.5) which will be fold change 0.58. Does this cutoff a right one to find differential expressed genes? Can I select DEGs with lfc = 1.5 ? And how can I give lfc greater than equal to 1.5? Is this right lfc >= 1.5

ADD REPLYlink modified 3 months ago • written 3 months ago by Beginner50
1

Stop.

Take a deep breath.

Now, remember the difference between a fold-change and a log-fold change. When we set lfc=log2(1.5), we are testing against a fold change of 1.5 and a log-fold change of ~0.58.

If you were to set lfc=1.5, you would be testing against a fold change of ~2.8. However, this is likely to be too stringent to be useful; you would fail to detect a lot of relevant DE genes with log-fold changes around 1 (i.e., fold-changes of 2).

Indeed, even lfc=1 is likely to be too stringent. Remember, we are testing against lfc, so a gene with a log-fold change of 1 will not be significantly different from lfc=1. This is why the paper uses lfc=log2(1.5), which allows genes with log-fold changes around unity to be detected.

In your code above, you use lfc=2, which on hindsight is likely to be far too high. If you're wondering what threshold to use, read Yunshun's comments C: Obtaining Differentially expressed gene lists at a Fold change and FDR cut-off. In short, pick the log-fold change below which you are definitely not interested in the changes.

I don't know what your last few questions were referring to, but (i) there is no way to get a one-sided p-value from glmTreat, and (ii) putting lfc >= 1.5 in your glmTreat call doesn't make any syntactic sense.

ADD REPLYlink written 3 months ago by Aaron Lun21k
0
gravatar for Gordon Smyth
3 months ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

Dear Beginner,

What you are claiming just isn't true. If you have used glmTreat with lfc=log2(2) then all your DE genes will have fold-changes substantially greater than 2-fold. It is not possible in any circumstances that you will have DE genes with FC=1 (which corresponds to no change).

Aaron, James and I suspect that you are just getting confused by the log2-scale used to report fold-changes.

You should almost certainly follow the advice of the edgeR workflow paper, which suggests lfc=log2(1.5). In practice, that will ensure FC>2 for all or almost all of the DE genes. Is there some reason why you are not following that advice?

If you find it easier to read, you could add a fold-change column:

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(1.5))
tab <- topTags(tr, n=Inf, p=0.05)$table
tab$FC <- sign(tab$unshrunk.logFC) * 2^abs(tab$unshrunk.logFC)

However you really do need to become familiar with log2 fold changes if you want to work with bioinformatics data.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Gordon Smyth35k

Dear Gordon,

Sorry I got very confused with different answers.

I'm interested in detecting differentially expressed genes with fold change >= 2 and FDR < 0.05.

For this I have used like following: 

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(2))
tab2 <- topTags(tr,n=Inf, adjust.method = "BH")
keep <- tab2$table$FDR <= 0.05

This gave me differentially expressed genes with logFC >=1 and FDR < 0.05. Also logFC <=1 and FDR < 0.05. I have seen many genes with logFC 1, 1.2, 1.5 etc...

Here I understand that log2(2) is logFC 1. 

I was very confused because the output from edgeR doesn't give any FC column instead it gives logFC column. I come from wet lab background and interested in genes with high fold change. So, with logFC I got the FC (Fold change) in this way.....FC = 2^logFC....Now the FC column shows all the genes with FC > 2.

So, now please tell me whether this is right way to get differentially expressed genes with fold change >= 2 and FDR < 0.05?

ADD REPLYlink modified 3 months ago by Gordon Smyth35k • written 3 months ago by Beginner50

You haven't got different answers from anybody. They have been completely consistent. You do seem to have a problem understanding the difference between a log fold change and a fold change however.

You have been told several times, and even in your comment you note that a logFC of 1 is a linear fold change of 2. So why are you now asking if a logFC of 1 is a linear fold change of 2? YOU JUST SAID THAT IT WAS!

ADD REPLYlink written 3 months ago by James W. MacDonald47k

Dear Beginner, I already told you the right way to do the analysis, but you ignored me! The analysis you have done is unnecessarily conservative for what you want to do, as Aaron explained at some length in his last reply. It is all explained in the edgeR workflow as well.

Again, some of what you claim cannot be true. You cannot have gotten genes with abs(logFC) <= 1 and FDR < 0.05 from the code you show.

ADD REPLYlink modified 3 months ago • written 3 months ago by Gordon Smyth35k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 222 users visited in the last hour