Search
Question: Obtaining Differentially expressed gene lists at a Fold change and FDR cut-off
2
gravatar for candida.vaz
3.1 years ago by
candida.vaz30
Singapore
candida.vaz30 wrote:

Hi, My name is Candida Vaz, a post doctoral research fellow at the Bioinformatics Institute (BII) Singapore.

I am using edgeR package for my RNASeq data analysis. The User guide is extremely helpful, and I have been following it well. However, I am not able to find the R command to obtain the list of Differentially expressed genes at a Fold change (log-fold change) and FDR cut-off. The R command summary:

summary(de <- decideTestsDGE(et, p=0.05, adjust="BH"))

only gives the summary/number of DE genes. And the command topTags:

topTags(et, n=10, adjust.method="BH", sort.by="PValue")

gives the information/list of the top "n" number of DE genes.

I am interested in getting the full list of DE genes at the log fold change cut-off of (+-1) and FDR cut-off of <0.05. Would appreciate your help with this query.

Thanks a lot.

Candida Vaz

ADD COMMENTlink modified 3.1 years ago by James W. MacDonald45k • written 3.1 years ago by candida.vaz30
6
gravatar for Aaron Lun
3.1 years ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

To get the list of DE genes with those criteria, you'd need to do a bit more work. Something like the code below would probably suffice.

out <- topTags(et, n=Inf, adjust.method="BH")
keep <- out$table$FDR <= 0.05 & abs(out$table$logFC) >= 1
out[keep,]

However, keep in mind that the selection of genes by log-fold change is ad hoc and may compromise FDR control.

ADD COMMENTlink written 3.1 years ago by Aaron Lun17k
1

It would be useful if topTags had similar options to topTable in limma, such as p.value and lfc.

ADD REPLYlink written 3.1 years ago by Dario Strbenac1.4k
3

It is actually a deliberate omission as we discourage people from cutting by p.value and logFC simultaneously. RNA-seq logFC are potentially too unstable for that. We would prefer people to use treatDGE().

Having said that, a fold change cutoff becomes more acceptable if a substantial prior.count has been used.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Gordon Smyth32k

Dear Gordon Smyth,

Thanks for pointing this out. However, on reading about treatDGE(), I understand that it can be applied to linear models (experiments that have more than one factor) as it is applied on the output from glmFit. Please correct me if I'm understanding it wrongly.

My project involves a single factor. In this case, I suppose I should use the classic edgeR. So, for classic edgeR, how should I obtain the list of DE genes at a particular p-value and logFC cut-off.

Thanks once again for your kind help!

Candida

 

 

ADD REPLYlink written 3.1 years ago by candida.vaz30
1

The GLM framework in edgeR can easily handle a simple experimental design involving just one factor. Just parameterize your design matrix appropriately, e.g., for an experiment with two replicates in each of two groups "a" and "b":

design <- model.matrix(~factor(c("a", "a", "b", "b")))

For this type of analysis, the GLM methods will give performance similar to classic edgeR. However, the use of GLMs will provide a lot more flexibility, e.g., you can now apply treatDGE. More generally, the fact that the classic methods are restricted to one-factor designs doesn't mean that they're superior to the GLM methods in analyzing those designs.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Aaron Lun17k

Dear Aaron,

Thank you so much. I had been away for a few weeks.

This is the code I am using:  I would be grateful if you could let me know if there is anything wrong with it.

library(edgeR)
x <- read.delim("known-miR-expression-profiling-max-read-counts.txt",row.names="Symbol")
group <- factor(c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e"))
y <- DGEList(counts=x,group=group)
data.frame(Sample=colnames(y),group)
y <- calcNormFactors(y)
write.table(y$pseudo.counts, file="TMM_edgeR_Normalized.txt", row.names = TRUE, col.names = TRUE, sep = "\t")
design <- model.matrix(~group)
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
fit <- glmFit(y, design)
tr <- treatDGE(fit, coef=b, lfc=1)
topTags(tr)

I have two specific questions:-

1. How can I export the TMM normalised data?  The following command doesn't work.

write.table(y$pseudo.counts, file="TMM_edgeR_Normalized.txt", row.names = TRUE, col.names = TRUE, sep = "\t")

2. I am able to use the treatDGE function, but not sure of how to get the entire list of DE genes at the logFC and p-value cut-off.

Really appreciate your help!

ADD REPLYlink written 2.9 years ago by candida.vaz30
1

In the GLM framework, pseudo-counts no longer exist. TMM-normalized data can be obtained as CPM values instead:

norm.vals <- cpm(y)

This will account for any normalization factors that are present in y, assuming that y is a DGEList object.

As for treatDGE, the log-FC threshold is already considered in computing the p-value. This means that you only need to decide on a threshold for the p-value (or, more practically, the FDR) to get DE genes. You don't have to explicitly filter on log-FC. I'd so something like this:

out <- topTags(tr, n=Inf, adjust.method="BH")
keep <- out$table$FDR <= 0.05 
out[keep,]

Be aware, though, that a gene with a log-FC of 1, or just above 1, will probably not be detected by treatDGE. This is because such a log-FC value is not significantly different from the specified threshold (of 1). See Yunshun's answer for more details.

Finally, what's coef=b in your treatDGE call? I don't see b defined anywhere.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Aaron Lun17k

Dear Aaron,

Thank you so much for your prompt help!

I'm sorry it should be coef=2 instead of "b".

Please correct me if I'm wrong: My reference sample is "a"  (3 replicates) in the first 3 columns. I want to compare all the others (samples "b" to "e" 3 replicates each) to this reference sample. So I need to change the coef to 2 then 3,4,5 for comparing b vs a, c vs a, d vs a and e vs a; am I right?

Secondly, for obtaining the TMM normalised data, should I replace

y <- calcNormFactors(y) with norm.vals <- cpm(y) ?

Thirdly, how can I get the output in the form of a tab delimited file using the write.table()?

Thanks again for all your help! Really appreciate your effort!

Regards,

Candida

ADD REPLYlink written 2.9 years ago by candida.vaz30
1
  1. The parametrization of your contrasts is correct.
  2. Don't replace anything. You need the call to calcNormFactors to perform TMM normalization. The calculation of norm.vals is only suggested to provide some normalized values for exporting; they are not involved in the rest of the pipeline.
  3. I usually set sep="\t" and quote=FALSE in the call to write.table. It may also be useful to set col.names=NA, to add an empty first field to the header row (in order to accommodate row names in the first column of the file).
ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Aaron Lun17k

Thanks a lot Aaron,

Just a last bit of clarification:-

1. To get the TMM normalised values in a file. Are the following lines ok?

y <- calcNormFactors(y)
norm.vals <- cpm(y)
write.table(norm.vals, file="TMM_edgeR_Normalized.txt", row.names = TRUE, col.names = NA, quote=FALSE, sep = "\t")---This works!

2. Similarly, to get the DE gene list using treatDGE function, how should the write.table command be?

tr <- treatDGE(fit, coef=2, lfc=1)
out <- topTags(tr, n=Inf, adjust.method="BH")
keep <- out$table$FDR <= 0.05
write.table(keep, file="DE-mirs-1-2.txt", row.names = TRUE, col.names = NA, quote=FALSE, sep = "\t")----This isn't working

Thanks a lot for your help.

Regards,

Candida

ADD REPLYlink written 2.9 years ago by candida.vaz30
1

Saving of TMM-normalized values looks okay. To get the DE gene list, you need:

write.table(out$table[keep,], file="DE-mirs-1-2.txt", row.names = TRUE, col.names = NA, quote=FALSE, sep = "\t")
ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Aaron Lun17k

Thank you very much Aaron. I guess all my doubts are cleared now!

Just one last check:-

So once I run the first contrast (2 vs 1) and get the DE list as follows: 

fit <- glmFit(y, design)
tr <- treatDGE(fit, coef=2, lfc=1)
out <- topTags(tr, n=Inf, adjust.method="BH")
keep <- out$table$FDR <= 0.05
write.table(out$table[keep,], file="DE-mirs-1-2.txt", row.names = TRUE, col.names = NA, quote=FALSE, sep = "\t")

For the next contrasts (3 vs 1 and others vs 1) I just need to alter the commands as follows right?
tr <- treatDGE(fit, coef=3, lfc=1)
out <- topTags(tr, n=Inf, adjust.method="BH")
keep <- out$table$FDR <= 0.05
write.table(out$table[keep,], file="DE-mirs-1-3.txt", row.names = TRUE, col.names = NA, quote=FALSE, sep = "\t")

Really appreciate your kind help. Thanks once again!

Regards,

Candida

ADD REPLYlink written 2.9 years ago by candida.vaz30
4
gravatar for Yunshun Chen
3.1 years ago by
Yunshun Chen370
Australia
Yunshun Chen370 wrote:

There is a formal statistical test to test for DE against a certain fold-change threshold. It is called "TREAT" and It was originally implemented in the limma package for microarray data. Now the TREAT method has been extended to RNA-Seq data and it is implemented in the treatDGE function in the latest release of edgeR. Read the documentation of that function for more detail.

Note that testing for |logFC| > 1 by TREAT is not the same as selecting genes with |logFC| > 1. Genes will need to exceed this threshold by some margin, depending on the data, before being declared statistically significant. It would be better to interpret the threshold as ‘the fold-change below which we are definitely not interested in the gene’ rather than ‘the fold-change above which we are interested in the gene’.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Yunshun Chen370
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 370 users visited in the last hour