Question

Can we use Entrez IDs for GOSeq?

0

Entering edit mode

Mehmet Ilyas Cosacak • 0

@mehmet-ilyas-cosacak-9020

Last seen 7.3 years ago

Germany/Dresden/ CRTD - DZNE

Hi,

I have a microarray data that I would like to do GO analysis using GOSeq. However, I have ENTREZ gene ids. When I am trying to convert ENTREZ ids to ENSEMBL ids, I lose several of my genes from the universe, either because of 1:many mappings or no ENSEMBL ids. Is there a way to to use ENTREZ ids as names of the gene universe?

The data frame has also ENSEMBL transcript ids, Gene Name, REFSEQ ids but not ensembl IDs!!!.

I tried all possible options below but non did work:

pwf <- nullp(uniGenes, "mm10", "ensGene",bias.data = lengthData)
pwf <- nullp(uniGenes, "mm10", "ensGene")
pwf <- nullp(uniGenes, "mm10", "knownGene",bias.data = lengthData)
pwf <- nullp(uniGenes, "mm10", "knownGene")

any comments or suggestion would be really helpful.

best,

ilyas.

goseq entrez gene identifiers • 1.6k views

ADD COMMENT • link updated 9.2 years ago by James W. MacDonald 68k • written 9.2 years ago by Mehmet Ilyas Cosacak • 0

score 1 · Answer 1 · 2016-05-10

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 8 days ago

United States

There is no length bias for microarray data, so there is no reason to use a package designed for RNA-Seq to do your GO analysis. You can use GOstats directly, as it requires Entrez Gene IDs anyway.

ADD COMMENT • link 9.2 years ago James W. MacDonald 68k

0

Entering edit mode

Hi James,

Thank you very much! That is a good point that I should always keep in mind! Thanks again.

However, I am trying to compare a microarray data with the RNA-Seq data that I have. I am trying to use the same gene ontology package to find which GOs are enriched or over represented. Moreover, the microarray data is from mouse and the RNA-Seq is from Danie rerio. I am trying to find common pathways, Gene enrichment. Do you have any suggestion for that? Especially, gene ontology package or any strategy?

best,

ilyas.

ADD REPLY • link 9.2 years ago Mehmet Ilyas Cosacak • 0

1

Entering edit mode

I can think of two ways you could do what you want. First is to do the GO analysis that you are proposing, and see what terms are over-represented in each experiment. Second would be to do some form of gene set testing on both experiments and look for consistent pathways. In other words, you could use something like romer from the limma package on both experiments (on a reasonable battery of gene sets - I am not sure all of the Broad sets are particularly useful) and then look for overlaps.

ADD REPLY • link 9.2 years ago James W. MacDonald 68k