Differentially expressed genes selection with FC and FDR cutoff
2
1
Entering edit mode
Beginner ▴ 50
@beginner-15939
Last seen 14 months ago

Hi,

I'm using edgeR for differential analysis between tutor and normal tissue cases.

I would like to select differential expressed genes based on log foldchange >= 2 and FDR < 0.05

Based on edgeR f1000 Research paper I used tr <- glmTreat(fit, contrast=B.LvsP, lfc=log2(2)) which gives differential expressed genes with FC above 2 at FDR < 0.05

is.de <- decideTestsDGE(tr)
summary(is.de)
-1*Normal 1*TNBC
Down               1018
NotSig             7182
Up                 1535

Among those 1535 Upregulated genes I also see genes with fold change 1 and also 1.5 which are with FDR < 0.05. Don't know why it gives genes with FC 1 and 1.5 also in the results.

I need differential expressed genes with FC >= 2 & FDR < 0.05 cutoff. May I know how to get this.

Thank you

edgeR rnaseq r bioconductor • 920 views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

Do note that a log fold change of 1, when using log base 2 is in fact a two-fold difference.

0
Entering edit mode

Yes, I know that. I found the way to get differential expressed genes with logFC |2| and fDR < 0.05

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(2))​

tab2 <- topTags(tr,n=Inf, adjust.method = "BH")

keep <- tab2$table$FDR <= 0.05 & abs(tab2$table$logFC) >= 2

1
Entering edit mode

If you want the log fold change to be > |2|, then don't take logs for the lfc argument! Just use lfc = 2.

1
Entering edit mode

I don't think you fully understand James' point. If you're only interested in genes with a log-fold change (significantly) greater than 2, why are you using glmTreat with lfc=1? You should use lfc=2 instead. Also see the Note at the bottom of the documentation in ?topTable regarding post-hoc filtering on the log-fold change.

0
Entering edit mode

oh yes, I understand now. log2(2) is nothing but 1. So, instead I will be using like this.

tr <- glmTreat(fit, contrast=contrast.matrix)​

tab2 <- topTags(tr,n=Inf, adjust.method = "BH")

keep <- tab2$table$FDR <= 0.05 & abs(tab2$table$logFC) >= 2

Am I right here?

0
Entering edit mode

No. You should be using lfc=2 in glmTreat. DE genes should only be selected on the basis of their (adjusted) p-values. Again, I suggest you read the Note that I referred to above.

0
Entering edit mode

Sorry I was wrong may be. I saw one of your comment just now Obtaining Differentially expressed gene lists at a Fold change and FDR cut-off in which you answered in the way I did. Possibly it is because of treatDGE() function may be. So, I misunderstood from that.

So then to get differential expressed genes with | logFC | > 2 and FDR < 0.05 I will use following steps.

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=2)​
tab2 <- topTags(tr,n=Inf, adjust.method = "BH")
keep <- tab2$table$FDR <= 0.05
write.csv(tab2[keep,], "DEG.csv", row.names = TRUE)

Is this right now?

1
Entering edit mode

Yes, that is correct.

0
Entering edit mode

Thanks a lot Aaron !!

0
Entering edit mode

Hi Aaron,

Small doubt. In many Research Papers I have seen people mentioning about fold change >=2 and fold change >=5 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4927085/ [ Check Results section ]. My doubt is what should be the minimum fold change cutoff for finding differentially expressed genes? Because in edgeR paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934518/pdf/f1000research-5-9996.pdf [Page 14] they have used lfc = log2(1.5) which will be fold change 0.58. Does this cutoff a right one to find differential expressed genes? Can I select DEGs with lfc = 1.5 ? And how can I give lfc greater than equal to 1.5? Is this right lfc >= 1.5

1
Entering edit mode

Stop.

Take a deep breath.

Now, remember the difference between a fold-change and a log-fold change. When we set lfc=log2(1.5), we are testing against a fold change of 1.5 and a log-fold change of ~0.58.

If you were to set lfc=1.5, you would be testing against a fold change of ~2.8. However, this is likely to be too stringent to be useful; you would fail to detect a lot of relevant DE genes with log-fold changes around 1 (i.e., fold-changes of 2).

Indeed, even lfc=1 is likely to be too stringent. Remember, we are testing against lfc, so a gene with a log-fold change of 1 will not be significantly different from lfc=1. This is why the paper uses lfc=log2(1.5), which allows genes with log-fold changes around unity to be detected.

In your code above, you use lfc=2, which on hindsight is likely to be far too high. If you're wondering what threshold to use, read Yunshun's comments C: Obtaining Differentially expressed gene lists at a Fold change and FDR cut-off. In short, pick the log-fold change below which you are definitely not interested in the changes.

I don't know what your last few questions were referring to, but (i) there is no way to get a one-sided p-value from glmTreat, and (ii) putting lfc >= 1.5 in your glmTreat call doesn't make any syntactic sense.

0
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

Dear Beginner,

What you are claiming just isn't true. If you have used glmTreat with lfc=log2(2) then all your DE genes will have fold-changes substantially greater than 2-fold. It is not possible in any circumstances that you will have DE genes with FC=1 (which corresponds to no change).

Aaron, James and I suspect that you are just getting confused by the log2-scale used to report fold-changes.

You should almost certainly follow the advice of the edgeR workflow paper, which suggests lfc=log2(1.5). In practice, that will ensure FC>2 for all or almost all of the DE genes. Is there some reason why you are not following that advice?

If you find it easier to read, you could add a fold-change column:

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(1.5))
tab <- topTags(tr, n=Inf, p=0.05)$table tab$FC <- sign(tab$unshrunk.logFC) * 2^abs(tab$unshrunk.logFC)

However you really do need to become familiar with log2 fold changes if you want to work with bioinformatics data.

0
Entering edit mode

Dear Gordon,

Sorry I got very confused with different answers.

I'm interested in detecting differentially expressed genes with fold change >= 2 and FDR < 0.05.

For this I have used like following:

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(2))
tab2 <- topTags(tr,n=Inf, adjust.method = "BH")
keep <- tab2$table$FDR <= 0.05

This gave me differentially expressed genes with logFC >=1 and FDR < 0.05. Also logFC <=1 and FDR < 0.05. I have seen many genes with logFC 1, 1.2, 1.5 etc...

Here I understand that log2(2) is logFC 1.

I was very confused because the output from edgeR doesn't give any FC column instead it gives logFC column. I come from wet lab background and interested in genes with high fold change. So, with logFC I got the FC (Fold change) in this way.....FC = 2^logFC....Now the FC column shows all the genes with FC > 2.

So, now please tell me whether this is right way to get differentially expressed genes with fold change >= 2 and FDR < 0.05?

0
Entering edit mode

You haven't got different answers from anybody. They have been completely consistent. You do seem to have a problem understanding the difference between a log fold change and a fold change however.

You have been told several times, and even in your comment you note that a logFC of 1 is a linear fold change of 2. So why are you now asking if a logFC of 1 is a linear fold change of 2? YOU JUST SAID THAT IT WAS!

0
Entering edit mode

Dear Beginner, I already told you the right way to do the analysis, but you ignored me! The analysis you have done is unnecessarily conservative for what you want to do, as Aaron explained at some length in his last reply. It is all explained in the edgeR workflow as well.

Again, some of what you claim cannot be true. You cannot have gotten genes with abs(logFC) <= 1 and FDR < 0.05 from the code you show.