Question

Affymetrix mouse 430_2 array - gene expression and annotation

0

Entering edit mode

Rao,Xiayu ▴ 550

@raoxiayu-6003

Last seen 8.9 years ago

United States

Hello, I am now analyzing Affymetrix mouse 430_2 array, and need clarification for the following issues. 1) how to summarize the probe expression to the expression level of transcript/genes? We are interested in the gene expression. I know that for human 1.0 ST gene array, we can use oligo package to get transcript expression. And for illumina array, there are only few probes designed for each gene, so we can only look at the probe level. For this mouse 430_2 array, there are usually 11 probes. I am thinking that using rma may not be enough. 2) and add annotation thereafter? For the transcript level annotation, I have used the following code before. But not sure for this mouse array, is there a similar way or similar transcript database to do such? I know there is a database called mouse4302.db. ID <- featureNames(geneCore2) Symbol <- getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- data.frame(ID=ID,Symbol=Symbol) Any input would be very appreciated! Thank you very much in advance. Thanks, Xiayu [[alternative HTML version deleted]]

Annotation mouse4302 probe oligo Annotation mouse4302 probe oligo • 1.8k views

ADD COMMENT • link updated 9.8 years ago by James W. MacDonald 65k • written 9.8 years ago by Rao,Xiayu ▴ 550

score 0 · Answer 1 · 2014-07-21

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 11 hours ago

United States

Hi Xiayu, On 7/21/2014 12:08 PM, Rao,Xiayu wrote: > Hello, > > I am now analyzing Affymetrix mouse 430_2 array, and need > clarification for the following issues. > > 1) how to summarize the probe expression to the expression level of > transcript/genes? We are interested in the gene expression. I know > that for human 1.0 ST gene array, we can use oligo package to get > transcript expression. And for illumina array, there are only few > probes designed for each gene, so we can only look at the probe > level. For this mouse 430_2 array, there are usually 11 probes. I am > thinking that using rma may not be enough. I'm not sure I follow your logic. As we have passed through time, the number of probes per probeset has continually fallen, to the point now that the Exon arrays (and HTA, for that matter) have only four probes per probeset (or fewer). The Gene ST arrays when summarizing at the transcript level have more, in general, but that is simply because Affy combined exon probesets together. If you summarize the Gene ST arrays at the probeset level, you have mostly four or fewer (!) probes per probeset. So the old style 3'-biased arrays have in comparison a luxurious number of probes for rma() to summarize. You can use oligo for these arrays, or affy if you prefer. You will get identical results. > > 2) and add annotation thereafter? For the transcript level > annotation, I have used the following code before. But not sure for > this mouse array, is there a similar way or similar transcript > database to do such? I know there is a database called mouse4302.db. > ID <- featureNames(geneCore2) Symbol <- > getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- > data.frame(ID=ID,Symbol=Symbol) This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API: fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), "SYMBOL") And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have when in fact you are looking at the data for <some other="" gene="" with="" the="" same="" hugo="" symbol="">). fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), c("SYMBOL","GENENAME","ENTREZID")) Best, Jim > > Any input would be very appreciated! Thank you very much in advance. > > Thanks, Xiayu > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 9.8 years ago James W. MacDonald 65k

0

Entering edit mode

Hi, Jim Thanks a lot for your prompt reply and detailed explanation. You are always very helpful. So did you mean that I can use either of the following to get transcript/gene expression for the mouse 430_2 array and other 3'-based arrays? oligo: geneCore <- rma(mydata, target = "core") affy: rma(mydata) Thanks, Xiayu -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Monday, July 21, 2014 11:43 AM To: Rao,Xiayu; 'bioconductor at r-project.org' Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and annotation Hi Xiayu, On 7/21/2014 12:08 PM, Rao,Xiayu wrote: > Hello, > > I am now analyzing Affymetrix mouse 430_2 array, and need > clarification for the following issues. > > 1) how to summarize the probe expression to the expression level of > transcript/genes? We are interested in the gene expression. I know > that for human 1.0 ST gene array, we can use oligo package to get > transcript expression. And for illumina array, there are only few > probes designed for each gene, so we can only look at the probe level. > For this mouse 430_2 array, there are usually 11 probes. I am thinking > that using rma may not be enough. I'm not sure I follow your logic. As we have passed through time, the number of probes per probeset has continually fallen, to the point now that the Exon arrays (and HTA, for that matter) have only four probes per probeset (or fewer). The Gene ST arrays when summarizing at the transcript level have more, in general, but that is simply because Affy combined exon probesets together. If you summarize the Gene ST arrays at the probeset level, you have mostly four or fewer (!) probes per probeset. So the old style 3'-biased arrays have in comparison a luxurious number of probes for rma() to summarize. You can use oligo for these arrays, or affy if you prefer. You will get identical results. > > 2) and add annotation thereafter? For the transcript level annotation, > I have used the following code before. But not sure for this mouse > array, is there a similar way or similar transcript database to do > such? I know there is a database called mouse4302.db. > ID <- featureNames(geneCore2) Symbol <- > getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- > data.frame(ID=ID,Symbol=Symbol) This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API: fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), "SYMBOL") And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have when in fact you are looking at the data for <some other="" gene="" with="" the="" same="" hugo="" symbol="">). fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), c("SYMBOL","GENENAME","ENTREZID")) Best, Jim > > Any input would be very appreciated! Thank you very much in advance. > > Thanks, Xiayu > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 9.8 years ago Rao,Xiayu ▴ 550

0

Entering edit mode

Hi Xiayu, On 7/21/2014 1:19 PM, Rao,Xiayu wrote: > Hi, Jim > > Thanks a lot for your prompt reply and detailed explanation. You are always very helpful. > > So did you mean that I can use either of the following to get transcript/gene expression for the mouse 430_2 array and other 3'-based arrays? > oligo: geneCore <- rma(mydata, target = "core") There is no such concept (nor argument) for the 3'-biased arrays. From ?rma: ## S4 method for signature 'ExpressionFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL) You can only summarize the 3'-biased arrays at one level, because there is only one level. In other words, unlike the Gene ST and Exon ST arrays, each probe belongs only to a single probeset, and there are no alternative (Affy-sanctioned) ways to combine probes into probesets. So these are equivalent: oligo::rma(mydata) affy::rma(mydata) Best, Jim > affy: rma(mydata) > > Thanks, > Xiayu > > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Monday, July 21, 2014 11:43 AM > To: Rao,Xiayu; 'bioconductor at r-project.org' > Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and annotation > > Hi Xiayu, > > On 7/21/2014 12:08 PM, Rao,Xiayu wrote: >> Hello, >> >> I am now analyzing Affymetrix mouse 430_2 array, and need >> clarification for the following issues. >> >> 1) how to summarize the probe expression to the expression level of >> transcript/genes? We are interested in the gene expression. I know >> that for human 1.0 ST gene array, we can use oligo package to get >> transcript expression. And for illumina array, there are only few >> probes designed for each gene, so we can only look at the probe level. >> For this mouse 430_2 array, there are usually 11 probes. I am thinking >> that using rma may not be enough. > > I'm not sure I follow your logic. As we have passed through time, the number of probes per probeset has continually fallen, to the point now that the Exon arrays (and HTA, for that matter) have only four probes per probeset (or fewer). The Gene ST arrays when summarizing at the transcript level have more, in general, but that is simply because Affy combined exon probesets together. If you summarize the Gene ST arrays at the probeset level, you have mostly four or fewer (!) probes per probeset. > > So the old style 3'-biased arrays have in comparison a luxurious number of probes for rma() to summarize. You can use oligo for these arrays, or affy if you prefer. You will get identical results. > > >> >> 2) and add annotation thereafter? For the transcript level annotation, >> I have used the following code before. But not sure for this mouse >> array, is there a similar way or similar transcript database to do >> such? I know there is a database called mouse4302.db. >> ID <- featureNames(geneCore2) Symbol <- >> getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- >> data.frame(ID=ID,Symbol=Symbol) > > This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API: > > fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), "SYMBOL") > > And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have when in fact you are looking at the data for <some other="" gene="" with="" the="" same="" hugo="" symbol="">). > > fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), > c("SYMBOL","GENENAME","ENTREZID")) > > > Best, > > Jim > > >> >> Any input would be very appreciated! Thank you very much in advance. >> >> Thanks, Xiayu >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 9.8 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you very much, Jim! That' very clear. I now understand it. Thanks, Xiayu -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Monday, July 21, 2014 12:45 PM To: Rao,Xiayu; 'bioconductor at r-project.org' Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and annotation Hi Xiayu, On 7/21/2014 1:19 PM, Rao,Xiayu wrote: > Hi, Jim > > Thanks a lot for your prompt reply and detailed explanation. You are always very helpful. > > So did you mean that I can use either of the following to get transcript/gene expression for the mouse 430_2 array and other 3'-based arrays? > oligo: geneCore <- rma(mydata, target = "core") There is no such concept (nor argument) for the 3'-biased arrays. From ?rma: ## S4 method for signature 'ExpressionFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL) You can only summarize the 3'-biased arrays at one level, because there is only one level. In other words, unlike the Gene ST and Exon ST arrays, each probe belongs only to a single probeset, and there are no alternative (Affy-sanctioned) ways to combine probes into probesets. So these are equivalent: oligo::rma(mydata) affy::rma(mydata) Best, Jim > affy: rma(mydata) > > Thanks, > Xiayu > > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Monday, July 21, 2014 11:43 AM > To: Rao,Xiayu; 'bioconductor at r-project.org' > Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and > annotation > > Hi Xiayu, > > On 7/21/2014 12:08 PM, Rao,Xiayu wrote: >> Hello, >> >> I am now analyzing Affymetrix mouse 430_2 array, and need >> clarification for the following issues. >> >> 1) how to summarize the probe expression to the expression level of >> transcript/genes? We are interested in the gene expression. I know >> that for human 1.0 ST gene array, we can use oligo package to get >> transcript expression. And for illumina array, there are only few >> probes designed for each gene, so we can only look at the probe level. >> For this mouse 430_2 array, there are usually 11 probes. I am >> thinking that using rma may not be enough. > > I'm not sure I follow your logic. As we have passed through time, the number of probes per probeset has continually fallen, to the point now that the Exon arrays (and HTA, for that matter) have only four probes per probeset (or fewer). The Gene ST arrays when summarizing at the transcript level have more, in general, but that is simply because Affy combined exon probesets together. If you summarize the Gene ST arrays at the probeset level, you have mostly four or fewer (!) probes per probeset. > > So the old style 3'-biased arrays have in comparison a luxurious number of probes for rma() to summarize. You can use oligo for these arrays, or affy if you prefer. You will get identical results. > > >> >> 2) and add annotation thereafter? For the transcript level >> annotation, I have used the following code before. But not sure for >> this mouse array, is there a similar way or similar transcript >> database to do such? I know there is a database called mouse4302.db. >> ID <- featureNames(geneCore2) Symbol <- >> getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <- >> data.frame(ID=ID,Symbol=Symbol) > > This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API: > > fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), > "SYMBOL") > > And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have when in fact you are looking at the data for <some other="" gene="" with="" the="" same="" hugo="" symbol="">). > > fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), > c("SYMBOL","GENENAME","ENTREZID")) > > > Best, > > Jim > > >> >> Any input would be very appreciated! Thank you very much in advance. >> >> Thanks, Xiayu >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 9.8 years ago Rao,Xiayu ▴ 550