Question: Can we use Entrez IDs for GOSeq?
0
3.4 years ago by
Germany/Dresden/ CRTD - DZNE
Mehmet Ilyas Cosacak0 wrote:

Hi,

I have a microarray data that I would like to do GO analysis using GOSeq. However, I have ENTREZ gene ids. When I am trying to convert ENTREZ ids to ENSEMBL ids, I lose several of my genes from the universe, either because of 1:many mappings or no ENSEMBL ids. Is there a way to to use ENTREZ ids as names of the gene universe?

The data frame has also ENSEMBL transcript ids, Gene Name, REFSEQ ids but not ensembl IDs!!!.

I tried all possible options below but non did work:

pwf <- nullp(uniGenes, "mm10", "ensGene",bias.data = lengthData)
pwf <- nullp(uniGenes, "mm10", "ensGene")
pwf <- nullp(uniGenes, "mm10", "knownGene",bias.data = lengthData)
pwf <- nullp(uniGenes, "mm10", "knownGene")

best,

ilyas.

modified 3.4 years ago by James W. MacDonald51k • written 3.4 years ago by Mehmet Ilyas Cosacak0
Answer: Can we use Entrez IDs for GOSeq?
1
3.4 years ago by
United States
James W. MacDonald51k wrote:

There is no length bias for microarray data, so there is no reason to use a package designed for RNA-Seq to do your GO analysis. You can use GOstats directly, as it requires Entrez Gene IDs anyway.

Hi James,

Thank you very much! That is a good point that I should always keep in mind! Thanks again.

However, I am trying to compare a microarray data with the RNA-Seq data that I have. I am trying to use the same gene ontology package to find which GOs are enriched or over represented. Moreover, the microarray data is from mouse and the RNA-Seq is from Danie rerio. I am trying to find common pathways, Gene enrichment. Do you have any suggestion for that? Especially, gene ontology package or any strategy?

best,

ilyas.

1

I can think of two ways you could do what you want. First is to do the GO analysis that you are proposing, and see what terms are over-represented in each experiment. Second would be to do some form of gene set testing on both experiments and look for consistent pathways. In other words, you could use something like romer from the limma package on both experiments (on a reasonable battery of gene sets - I am not sure all of the Broad sets are particularly useful) and then look for overlaps.