Affymetrix mouse 430_2 array - gene expression and annotation
1
0
Entering edit mode
Rao,Xiayu ▴ 550
@raoxiayu-6003
Last seen 8.9 years ago
United States
Hello, I am now analyzing Affymetrix mouse 430_2 array, and need clarification for the following issues. 1) how to summarize the probe expression to the expression level of transcript/genes? We are interested in the gene expression. I know that for human 1.0 ST gene array, we can use oligo package to get transcript expression. And for illumina array, there are only few probes designed for each gene, so we can only look at the probe level. For this mouse 430_2 array, there are usually 11 probes. I am thinking that using rma may not be enough. 2) and add annotation thereafter? For the transcript level annotation, I have used the following code before. But not sure for this mouse array, is there a similar way or similar transcript database to do such? I know there is a database called mouse4302.db. ID <- featureNames(geneCore2) Symbol <- getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- data.frame(ID=ID,Symbol=Symbol) Any input would be very appreciated! Thank you very much in advance. Thanks, Xiayu [[alternative HTML version deleted]]
Annotation mouse4302 probe oligo Annotation mouse4302 probe oligo • 1.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States
Hi Xiayu, On 7/21/2014 12:08 PM, Rao,Xiayu wrote: > Hello, > > I am now analyzing Affymetrix mouse 430_2 array, and need > clarification for the following issues. > > 1) how to summarize the probe expression to the expression level of > transcript/genes? We are interested in the gene expression. I know > that for human 1.0 ST gene array, we can use oligo package to get > transcript expression. And for illumina array, there are only few > probes designed for each gene, so we can only look at the probe > level. For this mouse 430_2 array, there are usually 11 probes. I am > thinking that using rma may not be enough. I'm not sure I follow your logic. As we have passed through time, the number of probes per probeset has continually fallen, to the point now that the Exon arrays (and HTA, for that matter) have only four probes per probeset (or fewer). The Gene ST arrays when summarizing at the transcript level have more, in general, but that is simply because Affy combined exon probesets together. If you summarize the Gene ST arrays at the probeset level, you have mostly four or fewer (!) probes per probeset. So the old style 3'-biased arrays have in comparison a luxurious number of probes for rma() to summarize. You can use oligo for these arrays, or affy if you prefer. You will get identical results. > > 2) and add annotation thereafter? For the transcript level > annotation, I have used the following code before. But not sure for > this mouse array, is there a similar way or similar transcript > database to do such? I know there is a database called mouse4302.db. > ID <- featureNames(geneCore2) Symbol <- > getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- > data.frame(ID=ID,Symbol=Symbol) This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API: fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), "SYMBOL") And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have when in fact you are looking at the data for <some other="" gene="" with="" the="" same="" hugo="" symbol="">). fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), c("SYMBOL","GENENAME","ENTREZID")) Best, Jim > > Any input would be very appreciated! Thank you very much in advance. > > Thanks, Xiayu > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hi, Jim Thanks a lot for your prompt reply and detailed explanation. You are always very helpful. So did you mean that I can use either of the following to get transcript/gene expression for the mouse 430_2 array and other 3'-based arrays? oligo: geneCore <- rma(mydata, target = "core") affy: rma(mydata) Thanks, Xiayu -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Monday, July 21, 2014 11:43 AM To: Rao,Xiayu; 'bioconductor at r-project.org' Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and annotation Hi Xiayu, On 7/21/2014 12:08 PM, Rao,Xiayu wrote: > Hello, > > I am now analyzing Affymetrix mouse 430_2 array, and need > clarification for the following issues. > > 1) how to summarize the probe expression to the expression level of > transcript/genes? We are interested in the gene expression. I know > that for human 1.0 ST gene array, we can use oligo package to get > transcript expression. And for illumina array, there are only few > probes designed for each gene, so we can only look at the probe level. > For this mouse 430_2 array, there are usually 11 probes. I am thinking > that using rma may not be enough. I'm not sure I follow your logic. As we have passed through time, the number of probes per probeset has continually fallen, to the point now that the Exon arrays (and HTA, for that matter) have only four probes per probeset (or fewer). The Gene ST arrays when summarizing at the transcript level have more, in general, but that is simply because Affy combined exon probesets together. If you summarize the Gene ST arrays at the probeset level, you have mostly four or fewer (!) probes per probeset. So the old style 3'-biased arrays have in comparison a luxurious number of probes for rma() to summarize. You can use oligo for these arrays, or affy if you prefer. You will get identical results. > > 2) and add annotation thereafter? For the transcript level annotation, > I have used the following code before. But not sure for this mouse > array, is there a similar way or similar transcript database to do > such? I know there is a database called mouse4302.db. > ID <- featureNames(geneCore2) Symbol <- > getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- > data.frame(ID=ID,Symbol=Symbol) This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API: fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), "SYMBOL") And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have when in fact you are looking at the data for <some other="" gene="" with="" the="" same="" hugo="" symbol="">). fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), c("SYMBOL","GENENAME","ENTREZID")) Best, Jim > > Any input would be very appreciated! Thank you very much in advance. > > Thanks, Xiayu > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Xiayu, On 7/21/2014 1:19 PM, Rao,Xiayu wrote: > Hi, Jim > > Thanks a lot for your prompt reply and detailed explanation. You are always very helpful. > > So did you mean that I can use either of the following to get transcript/gene expression for the mouse 430_2 array and other 3'-based arrays? > oligo: geneCore <- rma(mydata, target = "core") There is no such concept (nor argument) for the 3'-biased arrays. From ?rma: ## S4 method for signature 'ExpressionFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL) You can only summarize the 3'-biased arrays at one level, because there is only one level. In other words, unlike the Gene ST and Exon ST arrays, each probe belongs only to a single probeset, and there are no alternative (Affy-sanctioned) ways to combine probes into probesets. So these are equivalent: oligo::rma(mydata) affy::rma(mydata) Best, Jim > affy: rma(mydata) > > Thanks, > Xiayu > > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Monday, July 21, 2014 11:43 AM > To: Rao,Xiayu; 'bioconductor at r-project.org' > Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and annotation > > Hi Xiayu, > > On 7/21/2014 12:08 PM, Rao,Xiayu wrote: >> Hello, >> >> I am now analyzing Affymetrix mouse 430_2 array, and need >> clarification for the following issues. >> >> 1) how to summarize the probe expression to the expression level of >> transcript/genes? We are interested in the gene expression. I know >> that for human 1.0 ST gene array, we can use oligo package to get >> transcript expression. And for illumina array, there are only few >> probes designed for each gene, so we can only look at the probe level. >> For this mouse 430_2 array, there are usually 11 probes. I am thinking >> that using rma may not be enough. > > I'm not sure I follow your logic. As we have passed through time, the number of probes per probeset has continually fallen, to the point now that the Exon arrays (and HTA, for that matter) have only four probes per probeset (or fewer). The Gene ST arrays when summarizing at the transcript level have more, in general, but that is simply because Affy combined exon probesets together. If you summarize the Gene ST arrays at the probeset level, you have mostly four or fewer (!) probes per probeset. > > So the old style 3'-biased arrays have in comparison a luxurious number of probes for rma() to summarize. You can use oligo for these arrays, or affy if you prefer. You will get identical results. > > >> >> 2) and add annotation thereafter? For the transcript level annotation, >> I have used the following code before. But not sure for this mouse >> array, is there a similar way or similar transcript database to do >> such? I know there is a database called mouse4302.db. >> ID <- featureNames(geneCore2) Symbol <- >> getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- >> data.frame(ID=ID,Symbol=Symbol) > > This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API: > > fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), "SYMBOL") > > And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have when in fact you are looking at the data for <some other="" gene="" with="" the="" same="" hugo="" symbol="">). > > fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), > c("SYMBOL","GENENAME","ENTREZID")) > > > Best, > > Jim > > >> >> Any input would be very appreciated! Thank you very much in advance. >> >> Thanks, Xiayu >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Thank you very much, Jim! That' very clear. I now understand it. Thanks, Xiayu -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Monday, July 21, 2014 12:45 PM To: Rao,Xiayu; 'bioconductor at r-project.org' Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and annotation Hi Xiayu, On 7/21/2014 1:19 PM, Rao,Xiayu wrote: > Hi, Jim > > Thanks a lot for your prompt reply and detailed explanation. You are always very helpful. > > So did you mean that I can use either of the following to get transcript/gene expression for the mouse 430_2 array and other 3'-based arrays? > oligo: geneCore <- rma(mydata, target = "core") There is no such concept (nor argument) for the 3'-biased arrays. From ?rma: ## S4 method for signature 'ExpressionFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL) You can only summarize the 3'-biased arrays at one level, because there is only one level. In other words, unlike the Gene ST and Exon ST arrays, each probe belongs only to a single probeset, and there are no alternative (Affy-sanctioned) ways to combine probes into probesets. So these are equivalent: oligo::rma(mydata) affy::rma(mydata) Best, Jim > affy: rma(mydata) > > Thanks, > Xiayu > > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Monday, July 21, 2014 11:43 AM > To: Rao,Xiayu; 'bioconductor at r-project.org' > Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and > annotation > > Hi Xiayu, > > On 7/21/2014 12:08 PM, Rao,Xiayu wrote: >> Hello, >> >> I am now analyzing Affymetrix mouse 430_2 array, and need >> clarification for the following issues. >> >> 1) how to summarize the probe expression to the expression level of >> transcript/genes? We are interested in the gene expression. I know >> that for human 1.0 ST gene array, we can use oligo package to get >> transcript expression. And for illumina array, there are only few >> probes designed for each gene, so we can only look at the probe level. >> For this mouse 430_2 array, there are usually 11 probes. I am >> thinking that using rma may not be enough. > > I'm not sure I follow your logic. As we have passed through time, the number of probes per probeset has continually fallen, to the point now that the Exon arrays (and HTA, for that matter) have only four probes per probeset (or fewer). The Gene ST arrays when summarizing at the transcript level have more, in general, but that is simply because Affy combined exon probesets together. If you summarize the Gene ST arrays at the probeset level, you have mostly four or fewer (!) probes per probeset. > > So the old style 3'-biased arrays have in comparison a luxurious number of probes for rma() to summarize. You can use oligo for these arrays, or affy if you prefer. You will get identical results. > > >> >> 2) and add annotation thereafter? For the transcript level >> annotation, I have used the following code before. But not sure for >> this mouse array, is there a similar way or similar transcript >> database to do such? I know there is a database called mouse4302.db. >> ID <- featureNames(geneCore2) Symbol <- >> getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- >> data.frame(ID=ID,Symbol=Symbol) > > This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API: > > fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), > "SYMBOL") > > And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have when in fact you are looking at the data for <some other="" gene="" with="" the="" same="" hugo="" symbol="">). > > fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), > c("SYMBOL","GENENAME","ENTREZID")) > > > Best, > > Jim > > >> >> Any input would be very appreciated! Thank you very much in advance. >> >> Thanks, Xiayu >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY

Login before adding your answer.

Traffic: 1050 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6