GO term enrichment analysis over whole genome, copy number aberration investigation
2
0
Entering edit mode
@nathan-harmston-2904
Last seen 10.3 years ago
Hi, I currently have a list of HUGO gene ids which relate to genes in areas of gain over a whole chromosome, and would like to perform GO enrichment analysis on them. So I have 2 problems: 1. currently i have been defining my gene universe based on affymetrix arrays, however now I am working over the whole genome. gene_universe = getBM(c("entrezgene"), mart = ensembl) ......however this leaves me with a gene_universe of 20275 gene ids (is this right?) 2. moving from my HUGO identifiers to entrez gene ids? I can do this using biomaRt test = getBM(c("entrezgene"), filters = "hgnc_symbol", values = stGained, mart = ensembl) however, this is not the same length as my number of hugo gene identifiers (in my case 30 are missing). Why is this? Is this just some weird annotation bug that can't be fixed or is it the way I m doing it. Does the bioconductor have the GO information for all genes in the genome and not just those in the annotation files for the affymetrix arrays? Finally.....what are the statistical implications of performing GO enrichment (Im using a conditional test) over a whole genome, would it be better to run the gene set enrichment analysis on each chromosome (I don think so)? I m trying to find evidence that genes relating to certain functions are gained over the whole chromosome (cancer study). I've ran a test one and have found some things which make sense. Many thanks in advance, Nathan
Annotation GO Annotation GO • 1.2k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.4 years ago
United States
Hi Nathan, Bioconductor does have packages that try to address the annotations for an entire organism based on their entrez gene IDs instead of their affymetrix (or other) IDs. These are called the organism packages. They have are named in a format like this example: "org.Hs.eg.db" which would be the "organism" package for "Homo sapiens" based on "Entrez Gene" IDs. If you think this will help you, please try it out. Marc Nathan Harmston wrote: > Hi, > > I currently have a list of HUGO gene ids which relate to genes in > areas of gain over a whole chromosome, and would like to perform GO > enrichment analysis on them. So I have 2 problems: > > 1. currently i have been defining my gene universe based on affymetrix > arrays, however now I am working over the whole genome. gene_universe > = getBM(c("entrezgene"), mart = ensembl) ......however this leaves me > with a gene_universe of 20275 gene ids (is this right?) > 2. moving from my HUGO identifiers to entrez gene ids? I can do this > using biomaRt > test = getBM(c("entrezgene"), filters = "hgnc_symbol", values = > stGained, mart = ensembl) > > however, this is not the same length as my number of hugo gene > identifiers (in my case 30 are missing). Why is this? Is this just > some weird annotation bug that can't be fixed or is it the way I m > doing it. Does the bioconductor have the GO information for all genes > in the genome and not just those in the annotation files for the > affymetrix arrays? > > Finally.....what are the statistical implications of performing GO > enrichment (Im using a conditional test) over a whole genome, would it > be better to run the gene set enrichment analysis on each chromosome > (I don think so)? I m trying to find evidence that genes relating to > certain functions are gained over the whole chromosome (cancer study). > I've ran a test one and have found some things which make sense. > > Many thanks in advance, > > Nathan > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 10.3 years ago
I'd say you have two choices of universe. Your list of IDs are HUGO gene ids, so your universe must be based on these. Choice 1: List of HUGO ids in your genome of choice. Of course, the number of HUGO ids may not match the humber of entrez gene ids, or any other ids, as no two ID systems map perfectly one-to-one. Choice 2: List of HUGO ids in your genome of choice that have at least one GO term. For some genomes, choice one and two will be the same; for others it will be radically different. If you choose the 2nd, then you must ensure that your list of "significant" HIGO ids also only contains IDS with a GO term. For an analysis per chromosome, you'd have to subset both your "significant" list and your universe by chromosome. -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Nathan Harmston Sent: 05 September 2008 11:11 To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] GO term enrichment analysis over whole genome,copy number aberration investigation Hi, I currently have a list of HUGO gene ids which relate to genes in areas of gain over a whole chromosome, and would like to perform GO enrichment analysis on them. So I have 2 problems: 1. currently i have been defining my gene universe based on affymetrix arrays, however now I am working over the whole genome. gene_universe = getBM(c("entrezgene"), mart = ensembl) ......however this leaves me with a gene_universe of 20275 gene ids (is this right?) 2. moving from my HUGO identifiers to entrez gene ids? I can do this using biomaRt test = getBM(c("entrezgene"), filters = "hgnc_symbol", values = stGained, mart = ensembl) however, this is not the same length as my number of hugo gene identifiers (in my case 30 are missing). Why is this? Is this just some weird annotation bug that can't be fixed or is it the way I m doing it. Does the bioconductor have the GO information for all genes in the genome and not just those in the annotation files for the affymetrix arrays? Finally.....what are the statistical implications of performing GO enrichment (Im using a conditional test) over a whole genome, would it be better to run the gene set enrichment analysis on each chromosome (I don think so)? I m trying to find evidence that genes relating to certain functions are gained over the whole chromosome (cancer study). I've ran a test one and have found some things which make sense. Many thanks in advance, Nathan _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6