Integrating DESeq2 and TopGO for Enrichment
2
2
Entering edit mode
nw328 ▴ 20
@nw328-7354
Last seen 9.9 years ago
United States

Hello, I'm a new (<4months) R user working in an academic lab, and trying to do some RNAseq analysis for an experiment.  I am using DESeq2 to compare expression levels.

What is the best way to integrate a DESeqDataSet with TopGO or another gene ontology enrichment package?

After creating the DESeqDataSet, I would like to use a list of the top differentially expressed genes from an RNAseq run to get a list of GO terms, and then to enrich that list of GO terms to the most likely parent.  Ideally, I would be able to use accessions from the same assembly the libraries were aligned to.  I am working with S. aureus USA300 and an S. cerevisiae S288C variant.

Does anyone have any pointers about how to approach this best, or resources that would be helpful in this project?

Thanks!

 

go deseq2 rnaseq r • 22k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

You might want to look at the goseq package, which is probably a better choice for RNA-Seq data than something like topGO or GOstats. There is (as usual) a vignette that should help you get started.

ADD COMMENT
3
Entering edit mode

Just to add to James' suggestion, if you look at the goseq vignette, it says:

"In order to perform a GO analysis of your RNA-seq data, goseq only requires a simple named vector, which contains two pieces of information.

1. Measured genes: all genes for which RNA-seq data was gathered for your experiment. Each element of your vector should be named by a unique gene identifier.

2. Differentially expressed genes: each element of your vector should be either a 1 or a 0, where 1 indicates that the gene is differentially expressed and 0 that it is not."

So this can be obtained with:

rowsum.threshold <- 1 # user chosen
fdr.threshold <- 0.1 # user chosen
rs <- rowSums(counts(dds))
dds <- dds[ rs > rowsum.threshold ,]
dds <- DESeq(dds)
res <- results(dds, independentFiltering=FALSE) # use count threshold instead of IF
assayed.genes <- rownames(res)
de.genes <- rownames(res)[ which(res$padj < fdr.threshold) ]

From here you can continue, as in Section 2 for example.

ADD REPLY
0
Entering edit mode

Hi, Michael,

Thank you for the script here. I have a question: why did you suggest to turn off independent filtering when generating the result for GOseq analysis? Will this have any effect on the final list of DE genes and their p values?

ADD REPLY
0
Entering edit mode

Yes, filtering affects the final list of DE genes. I'd recommend you pick a threshold by looking at the MA plot. The x-axis is the mean of normalized counts across samples, so you may want to use row mean instead of row sum as used above.

My reason for not using the built-in filtering in the code example above is that the GOseq model is not perhaps expecting that some of the adjusted p-values be NA. And so It's perhaps safer for the hand-off between packages that all genes have the same chance at being a member of de.genes.

ADD REPLY
1
Entering edit mode

I use the standard DEseq2 pipeline to generate the DE list for follow-up study. What if I just simply remove the genes with p-value of NA and then run GOseq? Wouldn't this way be more consistent with the DE result that uses independent filtering automatically?  

Sorry if my question does not make sense, I am a biologist trying to do all the informatics analysis myself. 

ADD REPLY
0
Entering edit mode

Sure, that should work as well. The significant genes are those with adjusted p-value < x and the universe is the set of genes with adjusted p-value not NA.

ADD REPLY
0
Entering edit mode

Thank you for your fast reply!! It's very helpful!

ADD REPLY
1
Entering edit mode
Bernd Klaus ▴ 610
@bernd-klaus-6281
Last seen 6.2 years ago
Germany

Hi,

as additional suggestion, in case you want to use topGO anyway, I have an example here:

http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html#gene-ontology-enrichment-analysis

that I used for teaching purposes, I tried to avoid biases by selection a background set

of genes that has comparable expression to the set of DE genes.

The full material is here:

http://www-huber.embl.de/users/klaus/teaching.html#differential-expression-analysis-of-rna-seq-predoc-course-2014

Best wishes,

Bernd

ADD COMMENT

Login before adding your answer.

Traffic: 534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6