Hi, My name is Candida Vaz, a post doctoral research fellow at the Bioinformatics Institute (BII) Singapore.

I am using edgeR package for my RNASeq data analysis. The User guide is extremely helpful, and I have been following it well. However, I am not able to find the R command to obtain the list of Differentially expressed genes at a Fold change (log-fold change) and FDR cut-off. The R command **summary**:

summary(de <- decideTestsDGE(et, p=0.05, adjust="BH"))

only gives the summary/number of DE genes. And the command** topTags**:

topTags(et, n=10, adjust.method="BH", sort.by="PValue")

gives the information/list of the top "n" number of DE genes.

I am interested in getting the** full list of DE genes at the log fold change cut-off of (+-1) and FDR cut-off of <0.05**. Would appreciate your help with this query.

Thanks a lot.

Candida Vaz

It would be useful if

`topTags`

had similar options to`topTable`

inlimma, such as`p.value`

and`lfc`

.It is actually a deliberate omission as we discourage people from cutting by p.value and logFC simultaneously. RNA-seq logFC are potentially too unstable for that. We would prefer people to use

`treatDGE()`

.Having said that, a fold change cutoff becomes more acceptable if a substantial prior.count has been used.

Dear Gordon Smyth,

Thanks for pointing this out. However, on reading about treatDGE(), I understand that it can be applied to linear models (experiments that have more than one factor) as it is applied on the output from glmFit. Please correct me if I'm understanding it wrongly.

My project involves a single factor. In this case, I suppose I should use the classic edgeR. So, for classic edgeR, how should I obtain the list of DE genes at a particular p-value and logFC cut-off.

Thanks once again for your kind help!

Candida

The GLM framework in

`edgeR`

can easily handle a simple experimental design involving just one factor. Just parameterize your design matrix appropriately, e.g., for an experiment with two replicates in each of two groups`"a"`

and`"b"`

:For this type of analysis, the GLM methods will give performance similar to classic

`edgeR`

. However, the use of GLMs will provide a lot more flexibility, e.g., you can now apply`treatDGE`

. More generally, the fact that the classic methods are restricted to one-factor designs doesn't mean that they're superior to the GLM methods in analyzing those designs.Dear Aaron,

Thank you so much. I had been away for a few weeks.

This is the code I am using: I would be grateful if you could let me know if there is anything wrong with it.

library(edgeR)

x <- read.delim("known-miR-expression-profiling-max-read-counts.txt",row.names="Symbol")

group <- factor(c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e"))

y <- DGEList(counts=x,group=group)

data.frame(Sample=colnames(y),group)

y <- calcNormFactors(y)

write.table(y$pseudo.counts, file="TMM_edgeR_Normalized.txt", row.names = TRUE, col.names = TRUE, sep = "\t")

design <- model.matrix(~group)

y <- estimateGLMCommonDisp(y, design, verbose=TRUE)

y <- estimateGLMTrendedDisp(y, design)

y <- estimateGLMTagwiseDisp(y, design)

fit <- glmFit(y, design)

tr <- treatDGE(fit, coef=b, lfc=1)

topTags(tr)

I have two specific questions:-

1. How can I export the TMM normalised data? The following command doesn't work.

write.table(y$pseudo.counts, file="TMM_edgeR_Normalized.txt", row.names = TRUE, col.names = TRUE, sep = "\t")

2. I am able to use the treatDGE function, but not sure of how to get the entire list of DE genes at the logFC and p-value cut-off.

Really appreciate your help!

In the GLM framework, pseudo-counts no longer exist. TMM-normalized data can be obtained as CPM values instead:

This will account for any normalization factors that are present in

`y`

, assuming that`y`

is a`DGEList`

object.As for

`treatDGE`

, the log-FC threshold is already considered in computing the p-value. This means that you only need to decide on a threshold for the p-value (or, more practically, the FDR) to get DE genes. You don't have to explicitly filter on log-FC. I'd so something like this:Be aware, though, that a gene with a log-FC of 1, or just above 1, will probably not be detected by

`treatDGE`

. This is because such a log-FC value is not significantly different from the specified threshold (of 1). See Yunshun's answer for more details.Finally, what's

`coef=b`

in your`treatDGE`

call? I don't see`b`

defined anywhere.Dear Aaron,

Thank you so much for your prompt help!

I'm sorry it should be coef=2 instead of "b".

Please correct me if I'm wrong: My reference sample is "a" (3 replicates) in the first 3 columns. I want to compare all the others (samples "b" to "e" 3 replicates each) to this reference sample. So I need to change the coef to 2 then 3,4,5 for comparing b vs a, c vs a, d vs a and e vs a; am I right?

Secondly, for obtaining the TMM normalised data, should I replace

y <- calcNormFactors(y) with norm.vals <- cpm(y) ?

Thirdly, how can I get the output in the form of a tab delimited file using the write.table()?

Thanks again for all your help! Really appreciate your effort!

Regards,

Candida

`calcNormFactors`

to perform TMM normalization. The calculation of`norm.vals`

is only suggested to provide some normalized values for exporting; they are not involved in the rest of the pipeline.`sep="\t"`

and`quote=FALSE`

in the call to`write.table`

. It may also be useful to set`col.names=NA`

, to add an empty first field to the header row (in order to accommodate row names in the first column of the file).Thanks a lot Aaron,

Just a last bit of clarification:-

1. To get the TMM normalised values in a file. Are the following lines ok?

y <- calcNormFactors(y)

norm.vals <- cpm(y)

write.table(norm.vals, file="TMM_edgeR_Normalized.txt", row.names = TRUE, col.names = NA, quote=FALSE, sep = "\t")---This works!

2. Similarly, to get the DE gene list using treatDGE function, how should the write.table command be?

tr <- treatDGE(fit, coef=2, lfc=1)

out <- topTags(tr, n=Inf, adjust.method="BH")

keep <- out$table$FDR <= 0.05

write.table(keep, file="DE-mirs-1-2.txt", row.names = TRUE, col.names = NA, quote=FALSE, sep = "\t")----This isn't workingThanks a lot for your help.

Regards,

Candida

Saving of TMM-normalized values looks okay. To get the DE gene list, you need:

Thank you very much Aaron. I guess all my doubts are cleared now!

Just one last check:-

So once I run the first contrast (2 vs 1) and get the DE list as follows:

fit <- glmFit(y, design)

tr <- treatDGE(fit, coef=2, lfc=1)

out <- topTags(tr, n=Inf, adjust.method="BH")

keep <- out$table$FDR <= 0.05

write.table(out$table[keep,], file="DE-mirs-1-2.txt", row.names = TRUE, col.names = NA, quote=FALSE, sep = "\t")

For the next contrasts (3 vs 1 and others vs 1) I just need to alter the commands as follows right?

tr <- treatDGE(fit, coef=3, lfc=1)

out <- topTags(tr, n=Inf, adjust.method="BH")

keep <- out$table$FDR <= 0.05

write.table(out$table[keep,], file="DE-mirs-1-3.txt", row.names = TRUE, col.names = NA, quote=FALSE, sep = "\t")

Really appreciate your kind help. Thanks once again!

Regards,

Candida