Question

Annotation of U95av2 array

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

Hi George, Please don't take list conversations off-list. The list archives are intended to be a source of information, and on the off chance that I might say something useful, it would be nice if people could find this later. As to your question, as I said below, we just map things from Entrez Gene to the other annotation sources, so whatever Entrez Gene says, we report. So if I grep out some probeset ID that maps to multiple UniGene IDs, I might get something like 35566_f_at, which maps to 5 UG IDs. Now if I get the Entrez ID (3576), go to the Entrez Gene webpage for this ID, and scroll to the very bottom, I see five UniGene IDs that this Entrez Gene ID corresponds to. We report four of these five, the only difference being we report Hs.443948 instead of Hs.654584. This is obviously a mistake because Hs.443948 is SLC4A1 instead of IL-8, but the hgu95av2 package was built on March 15, so maybe Entrez Gene has corrected this mistake in the interim. See http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dop t=full_report&list_uids=3576 Best, Jim Tseng, George C. wrote: > Jim, > > Thanks so much for your response. I have one further question. In > your annotation in Bioconductor, a probe set can map to multiple > unigene ID. This really confuses me. Shouldn't it be only one ID? > > George > > -----Original Message----- From: James MacDonald > [mailto:jmacdon at med.umich.edu] Sent: Sunday, April 01, 2007 9:59 AM > To: Tseng, George C. Cc: biocannotation at lists.fhcrc.org; Lu, Shu- Ya > Subject: Re: Annotation of U95av2 array > > Hi George, > > Tseng, George C. wrote: > >> Dear Dr. MacDonald and other Biocore Data Team members, >> >> I'm using your array annotations from Bioconductor in my research >> and I teach it in my microarray course as well. It is indeed a >> great tool for our data analysis and methodological development. >> Recently we're working on a meta-analysis research project to >> incorporate information from multiple data sets. My student took >> the Unigene ID annotations in all the U95av2 probes and compared >> with the result obtained from the Affymetrix website (the batch >> search in NetAffy). Among the 9704 probes annotated in >> Bioconductor, 724 probes were annotated completely differently in >> NetAffy. >> >> My question is: Do you obtain your Unigene ID annotation from >> Affymetrix database or other source? NetAffy annotations always >> have one Unigene ID to a probeset while your annotationis can have >> many. Can you give us some detail about your annotation procedure? > > > Nianhua Li makes the annotation packages, so she would be the final > trusted source. > > In the past, the process was to map Affy ID to Entrez Gene ID using > the annotation files that Affy supply on their website. We then use > AnnBuilder to do the mappings from Entrez Gene to all other > annotation sources, so it is not inconceivable that we would have > different UniGene IDs for a given probeset. > > In my experience, the BioC annotations are more up to date and > accurate than what Affy supply either on Netaffx or in their > annotation files. This is based on blatting the probe sequences. > > Best, > > Jim > > > >> Thanks! >> >> George >> >> ============================================ George C. Tseng >> Assistant Professor Dept of Biostatistics and Human Genetics, >> University of Pittsburgh http://www.pitt.edu/~ctseng, >> 412-624-5318 ============================================ > > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

Microarray Genetics Annotation GO Cancer hgu95av2 probe affy AnnBuilder PROcess Genetics • 1.8k views

ADD COMMENT • link updated 18.8 years ago by Nianhua Li ▴ 870 • written 18.8 years ago by James W. MacDonald 68k

score 0 · Answer 1 · 2007-04-18

Hi George, I would use Entrez Gene IDs to do the matching. You could also use the mappings that Affy provide. http://www.affymetrix.com/support/technical/byproduct.affx?product=hg- u133-plus I have never used them, but they may well be useful. Best, Jim Tseng, George C. wrote: > Hi Jim, > > Your clarification is very helpful. Then when we try to match genes > from two types of arrays (say U95 and U133), what would you > recommend? We originally thought Unigene ID would be a good choice > but it would become difficult if one probe set maps to multiple IDs. > Can you advise? Sorry, it may be a dumb question but I'm from > statistics background. > > Thanks. > > George > > -----Original Message----- From: James W. MacDonald > [mailto:jmacdon at med.umich.edu] Sent: Wednesday, April 18, 2007 9:51 > AM To: Tseng, George C. Cc: bioconductor at stat.math.ethz.ch Subject: > Re: [BioC] Annotation of U95av2 array > > Hi George, > > Please don't take list conversations off-list. The list archives are > intended to be a source of information, and on the off chance that I > might say something useful, it would be nice if people could find > this later. > > As to your question, as I said below, we just map things from Entrez > Gene to the other annotation sources, so whatever Entrez Gene says, > we report. So if I grep out some probeset ID that maps to multiple > UniGene IDs, I might get something like 35566_f_at, which maps to 5 > UG IDs. > > Now if I get the Entrez ID (3576), go to the Entrez Gene webpage for > this ID, and scroll to the very bottom, I see five UniGene IDs that > this Entrez Gene ID corresponds to. We report four of these five, the > only difference being we report Hs.443948 instead of Hs.654584. > > This is obviously a mistake because Hs.443948 is SLC4A1 instead of > IL-8, but the hgu95av2 package was built on March 15, so maybe Entrez > Gene has corrected this mistake in the interim. > > See > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&d opt=full_report&list_uids=3576 > > > Best, > > Jim > > > Tseng, George C. wrote: > >> Jim, >> >> Thanks so much for your response. I have one further question. In >> your annotation in Bioconductor, a probe set can map to multiple >> unigene ID. This really confuses me. Shouldn't it be only one ID? >> >> George >> >> -----Original Message----- From: James MacDonald >> [mailto:jmacdon at med.umich.edu] Sent: Sunday, April 01, 2007 9:59 AM >> To: Tseng, George C. Cc: biocannotation at lists.fhcrc.org; Lu, >> Shu-Ya Subject: Re: Annotation of U95av2 array >> >> Hi George, >> >> Tseng, George C. wrote: >> >> >>> Dear Dr. MacDonald and other Biocore Data Team members, >>> >>> I'm using your array annotations from Bioconductor in my research >>> and I teach it in my microarray course as well. It is indeed a >>> great tool for our data analysis and methodological development. >>> Recently we're working on a meta-analysis research project to >>> incorporate information from multiple data sets. My student took >>> the Unigene ID annotations in all the U95av2 probes and compared >>> with the result obtained from the Affymetrix website (the batch >>> search in NetAffy). Among the 9704 probes annotated in >>> Bioconductor, 724 probes were annotated completely differently in >>> NetAffy. >>> >>> My question is: Do you obtain your Unigene ID annotation from >>> Affymetrix database or other source? NetAffy annotations always >>> have one Unigene ID to a probeset while your annotationis can >>> have many. Can you give us some detail about your annotation >>> procedure? >> >> >> Nianhua Li makes the annotation packages, so she would be the final >> trusted source. >> >> In the past, the process was to map Affy ID to Entrez Gene ID using >> the annotation files that Affy supply on their website. We then >> use AnnBuilder to do the mappings from Entrez Gene to all other >> annotation sources, so it is not inconceivable that we would have >> different UniGene IDs for a given probeset. >> >> In my experience, the BioC annotations are more up to date and >> accurate than what Affy supply either on Netaffx or in their >> annotation files. This is based on blatting the probe sequences. >> >> Best, >> >> Jim >> >> >> >> >>> Thanks! >>> >>> George >>> >>> ============================================ George C. Tseng >>> Assistant Professor Dept of Biostatistics and Human Genetics, >>> University of Pittsburgh http://www.pitt.edu/~ctseng, >>> 412-624-5318 ============================================ >> >> > > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

score 0 · Answer 2 · 2007-04-18

Hi, James and George, The source data that we used for building the current hgu95av2 package in bioc 2.0 was downloaded on Feb 28, 2007. Hs.443948 was linked with 3576 in the source data. Unfortunately, it is too late to update the packages at this time. The way it works is that we download all the source data at once and use our local copy of the source data to create all the annotation packages. The annotation packages for GE CodeLink, Illumina and RNG_MRC gene chips are maintained by other organisations. We coordinated with each other to make sure all the packages are created by the same version of the source data. This process allows all the packages to be consistent with each other. It is too late to go one more round of the coordination process given the release is just one week away. Sorry... best nianhua