ragene10st

0

Entering edit mode

Sebastien Gerega ▴ 370

@sebastien-gerega-2229

Last seen 11.4 years ago

Thank you Marc and Manhong for your suggestions. I have attempted both methods and run into some problems. Firstly, I was able to build ragene10st.db using the following code: source("http://bioconductor.org/biocLite.R") biocLite("rat.db0") library(AnnotationDbi) fname = "RaGene-1_0-st-v1.EDITED.txt" wdir = getwd() makeRATCHIP_DB(affy=FALSE, prefix="ragene10st", fileName=fname, baseMapType="eg", outputDir = wdir, version="1.0.0", manufacturer = "Affymetrix", chipName = "Rat Gene ST Array", manufacturerUrl = "http://www.affymetrix.com") I then used this library for annotation of an analysis I performed. At this point I realised that about one third of the 29171 probes were assigned the gene symbol "RT1-C113". I realise this is due to the annotation file used being in the wrong format. I had used the "mrna_assignment" column which contains data appearing in a complex format. Here are a couple examples: NM_001099458 // RefSeq // Rattus norvegicus similar to putative pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 // 19 // 39 // 0 /// ENSRNOT00000046204 // Rn.217623 // --- NM_001099461 // Rn.217622 // --- /// NM_001099461 // Rn.217622 // --- /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 // Rn.217623 // --- Unfortunately for the Gene ST chips there are no columns that simply contain genbank, unigene, or refseq IDs. So instead I tried Manhong's suggestion of using a custom CDF but there is no custom CDF for rat gene ST arrays on the http://brainarray.mbni.med.umich.edu/ website. However, if I follow the link to http://nugo-r.bioinformatics.nl/NuGO_R.html I am able to locate an appropriate CDF. Unfortunately, upon further examination of this CDF package it appears as though the wrong probe IDs have been used. For example: > as.list(ragene10stv1rnentrezgSYMBOL)[1:5] $`112400_at` [1] "Nrg1" $`113882_at` [1] "Hemgn" $`113886_at` [1] "Kif1c" $`113892_at` [1] "Cml3" As far as I am aware the probe IDs used for rat gene ST arrays are in the following format (8 digits without "_at"): 10700001 10700003 10700004 10700005 10700013 Can anyone provide any advice for either of the two options? thanks, Sebastien Marc Carlson wrote: > Well one way is to navigate Affymetrix's website and grab the annotation > file > > http://www.affymetrix.com/support/technical/annotationfilesmain.affx > > Or you could also use Martin Morgans clever AffyCompatible package which > will let you get the data you need more directly. > > ##The 2nd approach would go something like this (adapting from Martins > Vignette): > library(AffyCompatible) > password <- "your_psswd" > rsrc <- NetAffxResource(user="you at someplace.com", password=password) > head(names(rsrc)) > affxDescription(rsrc[["RaGene-1_0-st-v1"]]) > annos <- rsrc[["RaGene-1_0-st-v1"]] > annos > sapply(affxAnnotation(annos), force) > anno <- rsrc[["RaGene-1_0-st-v1", "Probeset Annotations, CSV Format"]] > fl <- readAnnotation(rsrc, annotation=anno, content=FALSE) > fl > conn <- unz(fl, "RaGene-1_0-st-v1.na27.2.rn4.probeset.csv") > ##Then get a dataframe with the contents of the file in it > df = read.table(conn, header=TRUE, skip=18, sep=",") > > > > Marc > > > > > Sebastien Gerega wrote: > >> Hi Marc, >> I guess the problem lies in the fact that I don't know which >> Annotation file to use. I can't seem to find any that have the >> appropriate columns. What files were used to generate mogene10st.db >> and hugene10st.db ? I can find appropriate annotations for Affy 3' >> arrays but not for the Gene St ones.... >> thanks again, >> Sebastien >> >> >> Marc Carlson wrote: >> >>> Hi Sebastien, >>> >>> The affy parameter is just a shortcut for affymetrix expression >>> arrays. If you want to use that parameter, then you can download the >>> appropriate >>> annotation library file from Affymetrix website (which you probably have >>> to get anyhow), just point to it in the parameter and then call the >>> function. What SQLforge will then try to do is to parse this file by >>> removing from it only the probeset IDs and the entrez gene, refseq IDs >>> and unigene IDs from the file in order to sort out what all these genes >>> are and thus generate the files that are described in the vignette from >>> this affymetrix file. This will work as long as this particular >>> annotation file is formatted similarly to what has come before. But, >>> really this parameter is purely for convenience and not at all necessary >>> to using SQLForge. A lot of people use affy, so I just added this to >>> make it easier for that majority of users. >>> You almost as easily can just grab that same Affymetrix annotation >>> library file and make the tab delimited files that I described >>> yourself. All you really need is a file that tells the gene identity of >>> the different probesets. So you can ignore the vast majority of the >>> data in the file. If you have that, then you have all that you really >>> need to proceed. For most platforms this just means selecting out tow >>> of the columns and then creating a tab file from those. Then you have >>> to feed such a file to your function. >>> >>> Please let me know if you have more questions, >>> >>> >>> Marc >>> >>> >>> >>> >>> >>> Sebastien Gerega wrote: >>> >>> >>>> Hi Marc and thanks for your help. I've had a look at the SQLForge >>>> vignette and there are still a couple issues that are unclear to me. >>>> Firstly, for the Rat Gene ST arrays is it possible to use any of the >>>> annotation files from the Affymetrix site as input for makeRATCHIP_DB >>>> in AnnotationDbi? If not, and the list of probes has to be manually >>>> created what is the best way to go about doing this? >>>> thanks again, >>>> Sebastien >>>> >>>> >>>> Marc Carlson wrote: >>>> >>>> >>>>> Hi Sebastien, >>>>> >>>>> We have just never had anyone ask for one before. However, you can >>>>> make >>>>> a package for yourself if you follow the instructions in the SQLForge >>>>> vignette in the AnnotationDbi package: >>>>> >>>>> http://www.bioconductor.org/packages/devel/bioc/html/AnnotationD bi.html >>>>> >>>>> >>>>> Please let me know if you have further questions regarding this. >>>>> >>>>> Marc >>>>> >>>>> >>>>> Sebastien Gerega wrote: >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> I have been analysing human and mouse gene ST chips using a >>>>>> combination of the Aroma package and the hugene10st.db and >>>>>> mogene10st.db annotation packages. Now I am attempting to perform the >>>>>> same on some rat gene ST chips but have unable to find the >>>>>> corresponding annotations. Why is there no ragene10st? >>>>>> thanks, >>>>>> Sebastien >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> > > >

Annotation GO Rattus norvegicus ChipName cdf probe affy AnnotationDbi AffyCompatible GO • 2.7k views

ADD COMMENT • link updated 16.9 years ago by Groot, Philip de ▴ 630 • written 16.9 years ago by Sebastien Gerega ▴ 370

0

Entering edit mode

Manhong Dai ▴ 200

@manhong-dai-1910

Last seen 11.4 years ago

Hi Sebastien, Custom CDF version 11 is at http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF _download.asp#v11 If you prefer entrez gene based cdf, it is at http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/11. 0.1/entrezg.asp then search RaGene10stv1 in the page. In custom CDF entrezg, the probeset id is already entrez gene. That's why you saw the probeset ID in NUGO Custom CDF version 10 annotation package is not the same as the probeset id in affy's original custom CDF file. Best, Manhong > Date: Tue, 03 Mar 2009 16:08:33 +1100 > From: Sebastien Gerega <seb at="" gerega.net=""> > Subject: Re: [BioC] ragene10st > To: bioconductor at stat.math.ethz.ch > Message-ID: <49ACBB51.8070904 at gerega.net> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Thank you Marc and Manhong for your suggestions. > I have attempted both methods and run into some problems. Firstly, I was > able to build ragene10st.db using the following code: > > source("http://bioconductor.org/biocLite.R") > biocLite("rat.db0") > > library(AnnotationDbi) > fname = "RaGene-1_0-st-v1.EDITED.txt" > wdir = getwd() > makeRATCHIP_DB(affy=FALSE, > prefix="ragene10st", > fileName=fname, > baseMapType="eg", > outputDir = wdir, > version="1.0.0", > manufacturer = "Affymetrix", > chipName = "Rat Gene ST Array", > manufacturerUrl = "http://www.affymetrix.com") > > I then used this library for annotation of an analysis I performed. At > this point I realised that about one third of the 29171 probes were > assigned the gene symbol "RT1-C113". I realise this is due to the > annotation file used being in the wrong format. I had used the > "mrna_assignment" column which contains data appearing in a complex > format. Here are a couple examples: > NM_001099458 // RefSeq // Rattus norvegicus similar to putative > pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 // 19 // 39 > // 0 /// > ENSRNOT00000046204 // Rn.217623 // --- > NM_001099461 // Rn.217622 // --- /// NM_001099461 // Rn.217622 // --- > /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 // > Rn.217623 // --- > > Unfortunately for the Gene ST chips there are no columns that simply > contain genbank, unigene, or refseq IDs. > > So instead I tried Manhong's suggestion of using a custom CDF but there > is no custom CDF for rat gene ST arrays on the > http://brainarray.mbni.med.umich.edu/ website. However, if I follow the > link to http://nugo-r.bioinformatics.nl/NuGO_R.html I am able to locate > an appropriate CDF. Unfortunately, upon further examination of this CDF > package it appears as though the wrong probe IDs have been used. > For example: > > as.list(ragene10stv1rnentrezgSYMBOL)[1:5] > $`112400_at` > [1] "Nrg1" > > $`113882_at` > [1] "Hemgn" > > $`113886_at` > [1] "Kif1c" > > $`113892_at` > [1] "Cml3" > > As far as I am aware the probe IDs used for rat gene ST arrays are in > the following format (8 digits without "_at"): > 10700001 > 10700003 > 10700004 > 10700005 > 10700013 > > Can anyone provide any advice for either of the two options? > thanks, > Sebastien

ADD COMMENT • link 16.9 years ago Manhong Dai ▴ 200

0

Entering edit mode

Hi Sebastien, To follow-up and clarify on Manhong remarks: Philip, my collegue, prepared the annotation files for many of the Entrez-based remapped CDF files. The remapping of the probes has been done by Manhong et al @ the MBNI, and the mapped Entrez IDs are then used by Philip to create the corresponding annotation files (using the annotation/SQLForge library), that are made available trough the link you mentioned below. HTH, Guido > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of > Manhong Dai > Sent: 03 March 2009 15:36 > To: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] ragene10st > > Hi Sebastien, > > > Custom CDF version 11 is at > http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo > mCDF/CDF_download.asp#v11 > > If you prefer entrez gene based cdf, it is at > http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo > mCDF/11.0.1/entrezg.asp then search RaGene10stv1 in the page. > > > In custom CDF entrezg, the probeset id is already > entrez gene. That's why you saw the probeset ID in NUGO > Custom CDF version 10 annotation package is not the same as > the probeset id in affy's original custom CDF file. > > > Best, > Manhong > > > Date: Tue, 03 Mar 2009 16:08:33 +1100 > > From: Sebastien Gerega <seb at="" gerega.net=""> > > Subject: Re: [BioC] ragene10st > > To: bioconductor at stat.math.ethz.ch > > Message-ID: <49ACBB51.8070904 at gerega.net> > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Thank you Marc and Manhong for your suggestions. > > I have attempted both methods and run into some problems. > Firstly, I > > was able to build ragene10st.db using the following code: > > > > source("http://bioconductor.org/biocLite.R") > > biocLite("rat.db0") > > > > library(AnnotationDbi) > > fname = "RaGene-1_0-st-v1.EDITED.txt" > > wdir = getwd() > > makeRATCHIP_DB(affy=FALSE, > > prefix="ragene10st", > > fileName=fname, > > baseMapType="eg", > > outputDir = wdir, > > version="1.0.0", > > manufacturer = "Affymetrix", > > chipName = "Rat Gene ST Array", > > manufacturerUrl = "http://www.affymetrix.com") > > > > I then used this library for annotation of an analysis I > performed. At > > this point I realised that about one third of the 29171 probes were > > assigned the gene symbol "RT1-C113". I realise this is due to the > > annotation file used being in the wrong format. I had used the > > "mrna_assignment" column which contains data appearing in a complex > > format. Here are a couple examples: > > NM_001099458 // RefSeq // Rattus norvegicus similar to putative > > pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 > // 19 // 39 > > // 0 /// > > ENSRNOT00000046204 // Rn.217623 // --- > > NM_001099461 // Rn.217622 // --- /// NM_001099461 // > Rn.217622 // --- > > /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 // > > Rn.217623 // --- > > > > Unfortunately for the Gene ST chips there are no columns > that simply > > contain genbank, unigene, or refseq IDs. > > > > So instead I tried Manhong's suggestion of using a custom CDF but > > there is no custom CDF for rat gene ST arrays on the > > http://brainarray.mbni.med.umich.edu/ website. However, if I follow > > the link to http://nugo-r.bioinformatics.nl/NuGO_R.html I > am able to > > locate an appropriate CDF. Unfortunately, upon further > examination of > > this CDF package it appears as though the wrong probe IDs > have been used. > > For example: > > > as.list(ragene10stv1rnentrezgSYMBOL)[1:5] > > $`112400_at` > > [1] "Nrg1" > > > > $`113882_at` > > [1] "Hemgn" > > > > $`113886_at` > > [1] "Kif1c" > > > > $`113892_at` > > [1] "Cml3" > > > > As far as I am aware the probe IDs used for rat gene ST > arrays are in > > the following format (8 digits without "_at"): > > 10700001 > > 10700003 > > 10700004 > > 10700005 > > 10700013 > > > > Can anyone provide any advice for either of the two options? > > thanks, > > Sebastien > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 16.9 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Thanks to all those that offered advice. I have now managed to create an annotation file for rat gene ST arrays. It can be downloaded from: http://sydneybioinformatics.org/download/ragene10st.db.rar in case anyone else is interested in using it. Sebastien Hooiveld, Guido wrote: > Hi Sebastien, > To follow-up and clarify on Manhong remarks: > Philip, my collegue, prepared the annotation files for many of the > Entrez-based remapped CDF files. > The remapping of the probes has been done by Manhong et al @ the MBNI, > and the mapped Entrez IDs are then used by Philip to create the > corresponding annotation files (using the annotation/SQLForge library), > that are made available trough the link you mentioned below. > > HTH, > Guido > > > > >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch >> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of >> Manhong Dai >> Sent: 03 March 2009 15:36 >> To: bioconductor at stat.math.ethz.ch >> Subject: Re: [BioC] ragene10st >> >> Hi Sebastien, >> >> >> Custom CDF version 11 is at >> http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo >> mCDF/CDF_download.asp#v11 >> >> If you prefer entrez gene based cdf, it is at >> http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo >> mCDF/11.0.1/entrezg.asp then search RaGene10stv1 in the page. >> >> >> In custom CDF entrezg, the probeset id is already >> entrez gene. That's why you saw the probeset ID in NUGO >> Custom CDF version 10 annotation package is not the same as >> the probeset id in affy's original custom CDF file. >> >> >> Best, >> Manhong >> >> >>> Date: Tue, 03 Mar 2009 16:08:33 +1100 >>> From: Sebastien Gerega <seb at="" gerega.net=""> >>> Subject: Re: [BioC] ragene10st >>> To: bioconductor at stat.math.ethz.ch >>> Message-ID: <49ACBB51.8070904 at gerega.net> >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> >>> Thank you Marc and Manhong for your suggestions. >>> I have attempted both methods and run into some problems. >>> >> Firstly, I >> >>> was able to build ragene10st.db using the following code: >>> >>> source("http://bioconductor.org/biocLite.R") >>> biocLite("rat.db0") >>> >>> library(AnnotationDbi) >>> fname = "RaGene-1_0-st-v1.EDITED.txt" >>> wdir = getwd() >>> makeRATCHIP_DB(affy=FALSE, >>> prefix="ragene10st", >>> fileName=fname, >>> baseMapType="eg", >>> outputDir = wdir, >>> version="1.0.0", >>> manufacturer = "Affymetrix", >>> chipName = "Rat Gene ST Array", >>> manufacturerUrl = "http://www.affymetrix.com") >>> >>> I then used this library for annotation of an analysis I >>> >> performed. At >> >>> this point I realised that about one third of the 29171 probes were >>> assigned the gene symbol "RT1-C113". I realise this is due to the >>> annotation file used being in the wrong format. I had used the >>> "mrna_assignment" column which contains data appearing in a complex >>> format. Here are a couple examples: >>> NM_001099458 // RefSeq // Rattus norvegicus similar to putative >>> pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 >>> >> // 19 // 39 >> >>> // 0 /// >>> ENSRNOT00000046204 // Rn.217623 // --- >>> NM_001099461 // Rn.217622 // --- /// NM_001099461 // >>> >> Rn.217622 // --- >> >>> /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 // >>> Rn.217623 // --- >>> >>> Unfortunately for the Gene ST chips there are no columns >>> >> that simply >> >>> contain genbank, unigene, or refseq IDs. >>> >>> So instead I tried Manhong's suggestion of using a custom CDF but >>> there is no custom CDF for rat gene ST arrays on the >>> http://brainarray.mbni.med.umich.edu/ website. However, if I follow >>> the link to http://nugo-r.bioinformatics.nl/NuGO_R.html I >>> >> am able to >> >>> locate an appropriate CDF. Unfortunately, upon further >>> >> examination of >> >>> this CDF package it appears as though the wrong probe IDs >>> >> have been used. >> >>> For example: >>> > as.list(ragene10stv1rnentrezgSYMBOL)[1:5] >>> $`112400_at` >>> [1] "Nrg1" >>> >>> $`113882_at` >>> [1] "Hemgn" >>> >>> $`113886_at` >>> [1] "Kif1c" >>> >>> $`113892_at` >>> [1] "Cml3" >>> >>> As far as I am aware the probe IDs used for rat gene ST >>> >> arrays are in >> >>> the following format (8 digits without "_at"): >>> 10700001 >>> 10700003 >>> 10700004 >>> 10700005 >>> 10700013 >>> >>> Can anyone provide any advice for either of the two options? >>> thanks, >>> Sebastien >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 16.9 years ago Sebastien Gerega ▴ 370

0

Entering edit mode

Manhong Dai ▴ 200

@manhong-dai-1910

Last seen 11.4 years ago

Hi Sebastien, I CCed this email to bioc because actually this question is very common question for new custom CDF users. Our custom CDF doesn't use Affy's original probeset ID, instead, we only use individual probe, and organize them into meaningful probeset. Probeset in custom CDF is already entrez gene, ensg, ense or refseq, etc. So annotation package for custom CDF is not as important as it used to be. In most of my own analyses, I didn't even use annotation packages. The major reason we release custom CDF is custom CDF discards probes that are proved to be wrong with the latest gene definition. Moreover, there are two additional major benefits for bioconductor users, At first it provides a direct mapping between probe and gene/exon/transcript/ref/ug. Secondly, user can analyze many new chips (gene chip, exon chip and tiling chip) with the traditional way (rma, dchip, etc.). For your case, our custom CDF can only help your analysis starting from celfile. If you just want to get annotation for affy's original probeset, you have to stick to Marc's suggestion. Best, Manhong On Thu, 2009-03-05 at 15:45 +1100, Sebastien Gerega wrote: > Hi Manhong, > thank you for your help. I now understand that the probe IDs are > actually the Entrez IDs with "_at" pasted onto the end of the file. > However, given that I only have the Affy probe IDs - as in the orginal > ones in the form of: > > 10700001 > 10700003 > 10700004 > 10700005 > 10700013 > > how can I use the annotation package? For example given the affy ID > "10700001" how can I obtain the Entrez ID and additional annotations? > thanks for any advice you can offer! > regards, > Sebastien > > Manhong Dai wrote: > > Hi Sebastien, > > > > > > Custom CDF version 11 is at > > http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF /CDF_download.asp#v11 > > > > If you prefer entrez gene based cdf, it is at > > http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF /11.0.1/entrezg.asp then search RaGene10stv1 in the page. > > > > > > In custom CDF entrezg, the probeset id is already entrez gene. That's > > why you saw the probeset ID in NUGO Custom CDF version 10 annotation > > package is not the same as the probeset id in affy's original custom CDF > > file. > > > > > > Best, > > Manhong > > > > > >> Date: Tue, 03 Mar 2009 16:08:33 +1100 > >> From: Sebastien Gerega <seb at="" gerega.net=""> > >> Subject: Re: [BioC] ragene10st > >> To: bioconductor at stat.math.ethz.ch > >> Message-ID: <49ACBB51.8070904 at gerega.net> > >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed > >> > >> Thank you Marc and Manhong for your suggestions. > >> I have attempted both methods and run into some problems. Firstly, I was > >> able to build ragene10st.db using the following code: > >> > >> source("http://bioconductor.org/biocLite.R") > >> biocLite("rat.db0") > >> > >> library(AnnotationDbi) > >> fname = "RaGene-1_0-st-v1.EDITED.txt" > >> wdir = getwd() > >> makeRATCHIP_DB(affy=FALSE, > >> prefix="ragene10st", > >> fileName=fname, > >> baseMapType="eg", > >> outputDir = wdir, > >> version="1.0.0", > >> manufacturer = "Affymetrix", > >> chipName = "Rat Gene ST Array", > >> manufacturerUrl = "http://www.affymetrix.com") > >> > >> I then used this library for annotation of an analysis I performed. At > >> this point I realised that about one third of the 29171 probes were > >> assigned the gene symbol "RT1-C113". I realise this is due to the > >> annotation file used being in the wrong format. I had used the > >> "mrna_assignment" column which contains data appearing in a complex > >> format. Here are a couple examples: > >> NM_001099458 // RefSeq // Rattus norvegicus similar to putative > >> pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 // 19 // 39 > >> // 0 /// > >> ENSRNOT00000046204 // Rn.217623 // --- > >> NM_001099461 // Rn.217622 // --- /// NM_001099461 // Rn.217622 // --- > >> /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 // > >> Rn.217623 // --- > >> > >> Unfortunately for the Gene ST chips there are no columns that simply > >> contain genbank, unigene, or refseq IDs. > >> > >> So instead I tried Manhong's suggestion of using a custom CDF but there > >> is no custom CDF for rat gene ST arrays on the > >> http://brainarray.mbni.med.umich.edu/ website. However, if I follow the > >> link to http://nugo-r.bioinformatics.nl/NuGO_R.html I am able to locate > >> an appropriate CDF. Unfortunately, upon further examination of this CDF > >> package it appears as though the wrong probe IDs have been used. > >> For example: > >> > as.list(ragene10stv1rnentrezgSYMBOL)[1:5] > >> $`112400_at` > >> [1] "Nrg1" > >> > >> $`113882_at` > >> [1] "Hemgn" > >> > >> $`113886_at` > >> [1] "Kif1c" > >> > >> $`113892_at` > >> [1] "Cml3" > >> > >> As far as I am aware the probe IDs used for rat gene ST arrays are in > >> the following format (8 digits without "_at"): > >> 10700001 > >> 10700003 > >> 10700004 > >> 10700005 > >> 10700013 > >> > >> Can anyone provide any advice for either of the two options? > >> thanks, > >> Sebastien > >> > > > > >

ADD COMMENT • link 16.9 years ago Manhong Dai ▴ 200

0

Entering edit mode

Groot, Philip de ▴ 630

@groot-philip-de-1307

Last seen 11.4 years ago

Hello Sebastien, I checked out your ragene10st.db library and found the following: > ragene10st() Quality control information for ragene10st: This package has the following mappings: ragene10stACCNUM has 0 mapped keys (of 29216 keys) ragene10stALIAS2PROBE has 30979 mapped keys (of 30979 keys) ragene10stCHR has 19932 mapped keys (of 29216 keys) ragene10stCHRLENGTHS has 23 mapped keys (of 23 keys) ragene10stCHRLOC has 14225 mapped keys (of 29216 keys) ragene10stCHRLOCEND has 14225 mapped keys (of 29216 keys) ragene10stENSEMBL has 18339 mapped keys (of 29216 keys) ragene10stENSEMBL2PROBE has 17366 mapped keys (of 17366 keys) ragene10stENTREZID has 29216 mapped keys (of 29216 keys) ragene10stENZYME has 1494 mapped keys (of 29216 keys) ragene10stENZYME2PROBE has 692 mapped keys (of 692 keys) ragene10stGENENAME has 19953 mapped keys (of 29216 keys) ragene10stGO has 14583 mapped keys (of 29216 keys) ragene10stGO2ALLPROBES has 9863 mapped keys (of 9863 keys) ragene10stGO2PROBE has 7437 mapped keys (of 7437 keys) ragene10stMAP has 19502 mapped keys (of 29216 keys) ragene10stPATH has 4201 mapped keys (of 29216 keys) ragene10stPATH2PROBE has 206 mapped keys (of 206 keys) ragene10stPFAM has 19087 mapped keys (of 29216 keys) ragene10stPMID has 12157 mapped keys (of 29216 keys) ragene10stPMID2PROBE has 34899 mapped keys (of 34899 keys) ragene10stPROSITE has 19087 mapped keys (of 29216 keys) ragene10stREFSEQ has 19869 mapped keys (of 29216 keys) ragene10stSYMBOL has 19953 mapped keys (of 29216 keys) ragene10stUNIGENE has 18427 mapped keys (of 29216 keys) ragene10stUNIPROT has 10050 mapped keys (of 29216 keys) Additional Information about this package: DB schema: RATCHIP_DB DB schema version: 1.0 Organism: Rattus norvegicus Date for NCBI data: 2008-Sep2 Date for GO data: 200808 Date for KEGG data: 2008-Sep2 Date for Golden Path data: 2006-Jun20 Date for IPI data: 2008-Sep02 Date for Ensembl data: 2008-Jul23 That ACCNUM has 0 entries is weird. EntrezID entries are all found... So, you build this package on EntrezIDs and not on the AFFYIDs that are usually provided. So your library won't work properly for people whom simply download it blindly! Just to let you know. Perhaps you should consider to take it offline and to only provide it if people ask for it AND realise the problems with it? Regards, Dr. Philip de Groot Ph.D. Bioinformatics Researcher Wageningen University / TIFN Nutrigenomics Consortium Nutrition, Metabolism & Genomics Group Division of Human Nutrition PO Box 8129, 6700 EV Wageningen Visiting Address: Erfelijkheidsleer: De Valk, Building 304 Dreijenweg 2, 6703 HA Wageningen Room: 0052a T: +31-317-485786 F: +31-317-483342 E-mail: Philip.deGroot at wur.nl <mailto:philip.degroot at="" wur.nl=""> Internet: http://www.nutrigenomicsconsortium.nl <http: www.nutrigenomicsconsortium.nl=""/> http://humannutrition.wur.nl <http: humannutrition.wur.nl=""/> https://madmax.bioinformatics.nl <https: madmax.bioinformatics.nl=""/>

ADD COMMENT • link 16.9 years ago Groot, Philip de ▴ 630

0

Entering edit mode

Hi Philip, The ACCNUM does not refer to the Affymetrix probe IDs, but rather to the genbank IDs, which were not included in the construction of this particular package (hence their absence). For most people, this is not going to be much of a problem (and is unlikely to be a problem at all). The fact that this is a chip package means that for the vast majority of the mappings present, the Lkeys(mapName) will be Affymetrix probes otherwise, those mappings would not have any mapped keys. Hope this clarifies things, Marc Groot, Philip de wrote: > Hello Sebastien, > > I checked out your ragene10st.db library and found the following: > > >> ragene10st() >> > Quality control information for ragene10st: > > This package has the following mappings: > ragene10stACCNUM has 0 mapped keys (of 29216 keys) > ragene10stALIAS2PROBE has 30979 mapped keys (of 30979 keys) > ragene10stCHR has 19932 mapped keys (of 29216 keys) > ragene10stCHRLENGTHS has 23 mapped keys (of 23 keys) > ragene10stCHRLOC has 14225 mapped keys (of 29216 keys) > ragene10stCHRLOCEND has 14225 mapped keys (of 29216 keys) > ragene10stENSEMBL has 18339 mapped keys (of 29216 keys) > ragene10stENSEMBL2PROBE has 17366 mapped keys (of 17366 keys) > ragene10stENTREZID has 29216 mapped keys (of 29216 keys) > ragene10stENZYME has 1494 mapped keys (of 29216 keys) > ragene10stENZYME2PROBE has 692 mapped keys (of 692 keys) > ragene10stGENENAME has 19953 mapped keys (of 29216 keys) > ragene10stGO has 14583 mapped keys (of 29216 keys) > ragene10stGO2ALLPROBES has 9863 mapped keys (of 9863 keys) > ragene10stGO2PROBE has 7437 mapped keys (of 7437 keys) > ragene10stMAP has 19502 mapped keys (of 29216 keys) > ragene10stPATH has 4201 mapped keys (of 29216 keys) > ragene10stPATH2PROBE has 206 mapped keys (of 206 keys) > ragene10stPFAM has 19087 mapped keys (of 29216 keys) > ragene10stPMID has 12157 mapped keys (of 29216 keys) > ragene10stPMID2PROBE has 34899 mapped keys (of 34899 keys) > ragene10stPROSITE has 19087 mapped keys (of 29216 keys) > ragene10stREFSEQ has 19869 mapped keys (of 29216 keys) > ragene10stSYMBOL has 19953 mapped keys (of 29216 keys) > ragene10stUNIGENE has 18427 mapped keys (of 29216 keys) > ragene10stUNIPROT has 10050 mapped keys (of 29216 keys) > > Additional Information about this package: > DB schema: RATCHIP_DB > DB schema version: 1.0 > Organism: Rattus norvegicus > Date for NCBI data: 2008-Sep2 > Date for GO data: 200808 > Date for KEGG data: 2008-Sep2 > Date for Golden Path data: 2006-Jun20 > Date for IPI data: 2008-Sep02 > Date for Ensembl data: 2008-Jul23 > > That ACCNUM has 0 entries is weird. EntrezID entries are all found... So, you build this package on EntrezIDs and not on the AFFYIDs that are usually provided. So your library won't work properly for people whom simply download it blindly! Just to let you know. Perhaps you should consider to take it offline and to only provide it if people ask for it AND realise the problems with it? > > Regards, > > Dr. Philip de Groot Ph.D. > Bioinformatics Researcher > > Wageningen University / TIFN > Nutrigenomics Consortium > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > PO Box 8129, 6700 EV Wageningen > Visiting Address: Erfelijkheidsleer: De Valk, Building 304 > Dreijenweg 2, 6703 HA Wageningen > Room: 0052a > T: +31-317-485786 > F: +31-317-483342 > E-mail: Philip.deGroot at wur.nl <mailto:philip.degroot at="" wur.nl=""> > Internet: http://www.nutrigenomicsconsortium.nl <http: www.nutrigenomicsconsortium.nl=""/> > http://humannutrition.wur.nl <http: humannutrition.wur.nl=""/> > https://madmax.bioinformatics.nl <https: madmax.bioinformatics.nl=""/> > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 16.9 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Groot, Philip de ▴ 630

@groot-philip-de-1307

Last seen 11.4 years ago

Hello Marc, Yes, I agree! But I do see a problem though. The hugene10st and mogene10st annotation packages are build utilizing multiple sources, not only genbank or entrezg. That's what makes this package different compared to the other 2 and the reason that I feel that this package is not "compliant". Correct me if I am wrong. Additionally. I do not understand why the ragene10st arrays are not simply made available by the core team? Human and mouse is. Only rat is lacking (more ST arrays are not available at the moment). Regards, Dr. Philip de Groot Ph.D. Bioinformatics Researcher Wageningen University / TIFN Nutrigenomics Consortium Nutrition, Metabolism & Genomics Group Division of Human Nutrition PO Box 8129, 6700 EV Wageningen Visiting Address: Erfelijkheidsleer: De Valk, Building 304 Dreijenweg 2, 6703 HA Wageningen Room: 0052a T: +31-317-485786 F: +31-317-483342 E-mail: Philip.deGroot at wur.nl <mailto:philip.degroot at="" wur.nl=""> Internet: http://www.nutrigenomicsconsortium.nl <http: www.nutrigenomicsconsortium.nl=""/> http://humannutrition.wur.nl <http: humannutrition.wur.nl=""/> https://madmax.bioinformatics.nl <https: madmax.bioinformatics.nl=""/> ________________________________ Van: Marc Carlson [mailto:mcarlson at fhcrc.org] Verzonden: ma 09/03/2009 04:52 Aan: Groot, Philip de CC: bioconductor at stat.math.ethz.ch Onderwerp: Re: [BioC] ragene10st Hi Philip, The ACCNUM does not refer to the Affymetrix probe IDs, but rather to the genbank IDs, which were not included in the construction of this particular package (hence their absence). For most people, this is not going to be much of a problem (and is unlikely to be a problem at all). The fact that this is a chip package means that for the vast majority of the mappings present, the Lkeys(mapName) will be Affymetrix probes otherwise, those mappings would not have any mapped keys. Hope this clarifies things, Marc Groot, Philip de wrote: > Hello Sebastien, > > I checked out your ragene10st.db library and found the following: > > >> ragene10st() >> > Quality control information for ragene10st: > > This package has the following mappings: > ragene10stACCNUM has 0 mapped keys (of 29216 keys) > ragene10stALIAS2PROBE has 30979 mapped keys (of 30979 keys) > ragene10stCHR has 19932 mapped keys (of 29216 keys) > ragene10stCHRLENGTHS has 23 mapped keys (of 23 keys) > ragene10stCHRLOC has 14225 mapped keys (of 29216 keys) > ragene10stCHRLOCEND has 14225 mapped keys (of 29216 keys) > ragene10stENSEMBL has 18339 mapped keys (of 29216 keys) > ragene10stENSEMBL2PROBE has 17366 mapped keys (of 17366 keys) > ragene10stENTREZID has 29216 mapped keys (of 29216 keys) > ragene10stENZYME has 1494 mapped keys (of 29216 keys) > ragene10stENZYME2PROBE has 692 mapped keys (of 692 keys) > ragene10stGENENAME has 19953 mapped keys (of 29216 keys) > ragene10stGO has 14583 mapped keys (of 29216 keys) > ragene10stGO2ALLPROBES has 9863 mapped keys (of 9863 keys) > ragene10stGO2PROBE has 7437 mapped keys (of 7437 keys) > ragene10stMAP has 19502 mapped keys (of 29216 keys) > ragene10stPATH has 4201 mapped keys (of 29216 keys) > ragene10stPATH2PROBE has 206 mapped keys (of 206 keys) > ragene10stPFAM has 19087 mapped keys (of 29216 keys) > ragene10stPMID has 12157 mapped keys (of 29216 keys) > ragene10stPMID2PROBE has 34899 mapped keys (of 34899 keys) > ragene10stPROSITE has 19087 mapped keys (of 29216 keys) > ragene10stREFSEQ has 19869 mapped keys (of 29216 keys) > ragene10stSYMBOL has 19953 mapped keys (of 29216 keys) > ragene10stUNIGENE has 18427 mapped keys (of 29216 keys) > ragene10stUNIPROT has 10050 mapped keys (of 29216 keys) > > Additional Information about this package: > DB schema: RATCHIP_DB > DB schema version: 1.0 > Organism: Rattus norvegicus > Date for NCBI data: 2008-Sep2 > Date for GO data: 200808 > Date for KEGG data: 2008-Sep2 > Date for Golden Path data: 2006-Jun20 > Date for IPI data: 2008-Sep02 > Date for Ensembl data: 2008-Jul23 > > That ACCNUM has 0 entries is weird. EntrezID entries are all found... So, you build this package on EntrezIDs and not on the AFFYIDs that are usually provided. So your library won't work properly for people whom simply download it blindly! Just to let you know. Perhaps you should consider to take it offline and to only provide it if people ask for it AND realise the problems with it? > > Regards, > > Dr. Philip de Groot Ph.D. > Bioinformatics Researcher > > Wageningen University / TIFN > Nutrigenomics Consortium > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > PO Box 8129, 6700 EV Wageningen > Visiting Address: Erfelijkheidsleer: De Valk, Building 304 > Dreijenweg 2, 6703 HA Wageningen > Room: 0052a > T: +31-317-485786 > F: +31-317-483342 > E-mail: Philip.deGroot at wur.nl <mailto:philip.degroot at="" wur.nl=""> > Internet: http://www.nutrigenomicsconsortium.nl <http: www.nutrigenomicsconsortium.nl=""/> <http: www.nutrigenomicsconsortium.nl=""/> > http://humannutrition.wur.nl <http: humannutrition.wur.nl=""/> <http: humannutrition.wur.nl=""/> > https://madmax.bioinformatics.nl <https: madmax.bioinformatics.nl=""/> <https: madmax.bioinformatics.nl=""/> > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 16.9 years ago Groot, Philip de ▴ 630

Login before adding your answer.