Annotations dealing with "removed" refseq record
1
0
Entering edit mode
Francois Pepin ★ 1.3k
@francois-pepin-1012
Last seen 9.6 years ago
Hi, I think the annotation system has problems dealing with RefSeq that were removed. This is looking at the Erbb2 gene in mouse (entrezID=13866) on the whole genome mouse chip from Agilent (annotation package: mgug4122a). From the annotations provided by Agilent, there are 2 probes that map to it: A_52_P49250 and A_51_P216179. Currently, the annotations do not give any results for it: > library(mgug4122a) > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aSYMBOL)) A_52_P49250 A_51_P216179 NA NA The accession number that is given indeed points to NM_010152. > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aACCNUM)) A_52_P49250 A_51_P216179 "NM_010152" "NM_010152" Looking at it on the NCBI website, it does point to Erbb2, but it also says: "This record was removed by RefSeq staff". Not being entirely familiar with the process, I would point to this as a likely reason for the lack of annotations for those two probes. I have not done an extensive check between the Agilent annotation and the ones in mgug4122a to see how many other probes might be hit by this. > sessionInfo() R version 2.5.0 (2007-04-23) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8; LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8; LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8; LC_IDENTIFICATION=C attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" [6] "methods" "base" other attached packages: mgug4122a "1.16.0" If there is any more information I can provide, please tell me. Francois
Annotation mgug4122a PROcess Annotation mgug4122a PROcess • 1.2k views
ADD COMMENT
0
Entering edit mode
Nianhua Li ▴ 870
@nianhua-li-1606
Last seen 9.6 years ago
Hi, Francois, If I remember correctly, we had a hard time finding up-to-date annotations from Agilent. The annotation file we downloaded from Agilent was out-of- date. We still update the annotation packages for each release, but probeset to gene mapping (recorded in mgug4122aACCNUM) hasn't been updated for quite a long time. In another word, we only update the annotations for the genes. So, if mgug4122aACCNUM is wrong/deprecated for a probeset, then other annotations for this probeset will be incorrect. Could you please post the link to the up-to-date annotation file? We can re-build the annotation packages base on them. Your help will be highly appreciated. thanks nianhua Quoting Francois Pepin <fpepin at="" cs.mcgill.ca="">: > Hi, > > I think the annotation system has problems dealing with RefSeq that were > removed. > > This is looking at the Erbb2 gene in mouse (entrezID=13866) on the whole > genome mouse chip from Agilent (annotation package: mgug4122a). From the > annotations provided by Agilent, there are 2 probes that map to it: > A_52_P49250 and A_51_P216179. > > Currently, the annotations do not give any results for it: > > > library(mgug4122a) > > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aSYMBOL)) > A_52_P49250 A_51_P216179 > NA NA > > The accession number that is given indeed points to NM_010152. > > > unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aACCNUM)) > A_52_P49250 A_51_P216179 > "NM_010152" "NM_010152" > > Looking at it on the NCBI website, it does point to Erbb2, but it also > says: "This record was removed by RefSeq staff". > > Not being entirely familiar with the process, I would point to this as a > likely reason for the lack of annotations for those two probes. > > I have not done an extensive check between the Agilent annotation and > the ones in mgug4122a to see how many other probes might be hit by this. > > > sessionInfo() > R version 2.5.0 (2007-04-23) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8; > LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8; > LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C; > LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8; > LC_IDENTIFICATION=C > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" "datasets" > [6] "methods" "base" > > other attached packages: > mgug4122a > "1.16.0" > > If there is any more information I can provide, please tell > me. > > Francois > >
ADD COMMENT
0
Entering edit mode
nli at fhcrc.org wrote: > Hi, Francois, > > If I remember correctly, we had a hard time finding up-to-date annotations from > Agilent. The annotation file we downloaded from Agilent was out-of- date. We > still update the annotation packages for each release, but probeset to gene > mapping (recorded in mgug4122aACCNUM) hasn't been updated for quite a long > time. In another word, we only update the annotations for the genes. So, if > mgug4122aACCNUM is wrong/deprecated for a probeset, then other annotations for > this probeset will be incorrect. > > Could you please post the link to the up-to-date annotation file? We can > re-build the annotation packages base on them. Your help will be highly > appreciated. > I think this is it, but they are still pretty old (many from fall, 2006): http://www.chem.agilent.com/cag/bsp/gene_lists.asp?arrayType=gene Sean
ADD REPLY
0
Entering edit mode
> > Could you please post the link to the up-to-date annotation file? We can > > re-build the annotation packages base on them. Your help will be highly > > appreciated. > > > I think this is it, but they are still pretty old (many from fall, 2006): > > http://www.chem.agilent.com/cag/bsp/gene_lists.asp?arrayType=gene Yes, that is a more recent version. Agilent also has another website (http://earray.chem.agilent.com/) for their customers that has more up-to-date definitions. For example the mouse whole genome array dates from February 2007. You might want to contact Agilent to get access to that site. Francois
ADD REPLY

Login before adding your answer.

Traffic: 858 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6