Question: Integrating DESeq2 and TopGO for Enrichment
1
4.2 years ago by
nw32810
United States
nw32810 wrote:

Hello, I'm a new (<4months) R user working in an academic lab, and trying to do some RNAseq analysis for an experiment.  I am using DESeq2 to compare expression levels.

What is the best way to integrate a DESeqDataSet with TopGO or another gene ontology enrichment package?

After creating the DESeqDataSet, I would like to use a list of the top differentially expressed genes from an RNAseq run to get a list of GO terms, and then to enrich that list of GO terms to the most likely parent.  Ideally, I would be able to use accessions from the same assembly the libraries were aligned to.  I am working with S. aureus USA300 and an S. cerevisiae S288C variant.

Does anyone have any pointers about how to approach this best, or resources that would be helpful in this project?

Thanks!

rnaseq go deseq2 R • 8.8k views
modified 4.2 years ago by Bernd Klaus540 • written 4.2 years ago by nw32810
Answer: Integrating DESeq2 and TopGO for Enrichment
1
4.2 years ago by
United States
James W. MacDonald49k wrote:

You might want to look at the goseq package, which is probably a better choice for RNA-Seq data than something like topGO or GOstats. There is (as usual) a vignette that should help you get started.

1

Just to add to James' suggestion, if you look at the goseq vignette, it says:

"In order to perform a GO analysis of your RNA-seq data, goseq only requires a simple named vector, which contains two pieces of information.

1. Measured genes: all genes for which RNA-seq data was gathered for your experiment. Each element of your vector should be named by a unique gene identifier.

2. Differentially expressed genes: each element of your vector should be either a 1 or a 0, where 1 indicates that the gene is differentially expressed and 0 that it is not."

So this can be obtained with:

rowsum.threshold <- 1 # user chosen
fdr.threshold <- 0.1 # user chosen
rs <- rowSums(counts(dds))
dds <- dds[ rs > rowsum.threshold ,]
dds <- DESeq(dds)
res <- results(dds, independentFiltering=FALSE) # use count threshold instead of IF
assayed.genes <- rownames(res)
de.genes <- rownames(res)[ which(res\$padj < fdr.threshold) ]

From here you can continue, as in Section 2 for example.

Hi, Michael,

Thank you for the script here. I have a question: why did you suggest to turn off independent filtering when generating the result for GOseq analysis? Will this have any effect on the final list of DE genes and their p values?

Yes, filtering affects the final list of DE genes. I'd recommend you pick a threshold by looking at the MA plot. The x-axis is the mean of normalized counts across samples, so you may want to use row mean instead of row sum as used above.

My reason for not using the built-in filtering in the code example above is that the GOseq model is not perhaps expecting that some of the adjusted p-values be NA. And so It's perhaps safer for the hand-off between packages that all genes have the same chance at being a member of de.genes.

I use the standard DEseq2 pipeline to generate the DE list for follow-up study. What if I just simply remove the genes with p-value of NA and then run GOseq? Wouldn't this way be more consistent with the DE result that uses independent filtering automatically?

Sorry if my question does not make sense, I am a biologist trying to do all the informatics analysis myself.

Sure, that should work as well. The significant genes are those with adjusted p-value < x and the universe is the set of genes with adjusted p-value not NA.

Answer: Integrating DESeq2 and TopGO for Enrichment
0
4.2 years ago by
Bernd Klaus540
Germany
Bernd Klaus540 wrote:

Hi,

as additional suggestion, in case you want to use topGO anyway, I have an example here:

http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html#gene-ontology-enrichment-analysis

that I used for teaching purposes, I tried to avoid biases by selection a background set

of genes that has comparable expression to the set of DE genes.

The full material is here:

http://www-huber.embl.de/users/klaus/teaching.html#differential-expression-analysis-of-rna-seq-predoc-course-2014

Best wishes,

Bernd