Assigning gene symbols to Affymetrix data and averaging probes

0

Entering edit mode

Hoyles, Lesley ▴ 50

@hoyles-lesley-5445

Last seen 9.6 years ago

Hi I have processed my affy data and am able to annotate the object mice.loess using the following. ID <- featureNames(mice.loess) Symbol <- getSYMBOL(ID,'mouse4302.db') fData(mice.loess) <- data.frame(ID=ID,Symbol=Symbol) However, when I convert my object as follows - expr.loess <- exprs(mice.loess) - I lose the annotation and have been unable to find a way to annotate expr.loess. Please could anybody suggest how I can annotate expr.loess? Is there a way of averaging probes for each gene with Affymetrix data? I've been able to do this with single-channel Agilent data using the example given in the limma guide. Thanks in advance for your help. Best wishes Lesley

Annotation annotate affy limma convert Annotation annotate affy limma convert • 1.6k views

ADD COMMENT • link updated 11.5 years ago by James W. MacDonald 65k • written 11.5 years ago by Hoyles, Lesley ▴ 50

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 6 hours ago

United States

Hi Lesley, On 10/3/2012 10:55 AM, Hoyles, Lesley wrote: > Hi > > I have processed my affy data and am able to annotate the object > mice.loess using the following. ID <- featureNames(mice.loess) Symbol > <- getSYMBOL(ID,'mouse4302.db') fData(mice.loess) <- > data.frame(ID=ID,Symbol=Symbol) > > > However, when I convert my object as follows - expr.loess <- > exprs(mice.loess) - I lose the annotation and have been unable to > find a way to annotate expr.loess. Please could anybody suggest how I > can annotate expr.loess? expr.loess <- data.frame(ID = ID, Symbol = Symbol, exprs(mice.loess)) > > > Is there a way of averaging probes for each gene with Affymetrix > data? I've been able to do this with single-channel Agilent data > using the example given in the limma guide. There are probably two reasonable ways to do this. First, the easiest. dat <- ReadAffy(cdfname = "mouse4302mmentrezcdf") and proceed from there. This will use the MBNI re-mapped CDF package based on Entrez Gene IDs, and you will have a single value per gene after summarization. There are other ways to map the probes; see http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF _download.asp at the bottom of the page for more info. Alternatively if you want to stick with the original probesets, the problem arises that some probesets are not well annotated, so what to do with those? In addition, gene symbols are not guaranteed to be unique, so you can't just assume that they are. Entrez Gene and UniGene IDs are supposed to be unique, so you could go with them, doing something like (untested) gns <- toTable(mouse4302ENTREZID) alldat <- merge(gns, expr.loess, by = 1) ## where expr.loess is the data.frame I suggest above alldatlst <- tapply(1:nrow(alldat), alldat$gene_id, function(x) alldat[x,]) combined.data <- do.call("rbind", lapply(alldatlst, function(x) c(x[1,1:3], colMeans(x[,-c(1:3)]))) Here I am assuming that after the merge() step the first three columns are the probeset ID, gene_id, symbol, and the remaining columns are the expression values. You will lose all data for which there isn't an Entrez Gene ID, but the same is true of the MBNI method I outline above. Best, Jim > > > Thanks in advance for your help. > > Best wishes Lesley _______________________________________________ > Bioconductor mailing list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.5 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim Thanks, the reannotation worked a treat. I've been able to export the normalized data in annotated format. I am adverse to removing probes that have no Entrez ID associated with them as I want to put the whole set of data through limma. I can't use the annotated expr.loess in lmFit, but is there a way I can get the symbol information into the output of lmFit (for instance, as fit$symbol)? Best wishes Lesley . ________________________________________ From: James W. MacDonald [jmacdon@uw.edu] Sent: 03 October 2012 16:30 To: Hoyles, Lesley Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Assigning gene symbols to Affymetrix data and averaging probes Hi Lesley, On 10/3/2012 10:55 AM, Hoyles, Lesley wrote: > Hi > > I have processed my affy data and am able to annotate the object > mice.loess using the following. ID <- featureNames(mice.loess) Symbol > <- getSYMBOL(ID,'mouse4302.db') fData(mice.loess) <- > data.frame(ID=ID,Symbol=Symbol) > > > However, when I convert my object as follows - expr.loess <- > exprs(mice.loess) - I lose the annotation and have been unable to > find a way to annotate expr.loess. Please could anybody suggest how I > can annotate expr.loess? expr.loess <- data.frame(ID = ID, Symbol = Symbol, exprs(mice.loess)) > > > Is there a way of averaging probes for each gene with Affymetrix > data? I've been able to do this with single-channel Agilent data > using the example given in the limma guide. There are probably two reasonable ways to do this. First, the easiest. dat <- ReadAffy(cdfname = "mouse4302mmentrezcdf") and proceed from there. This will use the MBNI re-mapped CDF package based on Entrez Gene IDs, and you will have a single value per gene after summarization. There are other ways to map the probes; see http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF _download.asp at the bottom of the page for more info. Alternatively if you want to stick with the original probesets, the problem arises that some probesets are not well annotated, so what to do with those? In addition, gene symbols are not guaranteed to be unique, so you can't just assume that they are. Entrez Gene and UniGene IDs are supposed to be unique, so you could go with them, doing something like (untested) gns <- toTable(mouse4302ENTREZID) alldat <- merge(gns, expr.loess, by = 1) ## where expr.loess is the data.frame I suggest above alldatlst <- tapply(1:nrow(alldat), alldat$gene_id, function(x) alldat[x,]) combined.data <- do.call("rbind", lapply(alldatlst, function(x) c(x[1,1:3], colMeans(x[,-c(1:3)]))) Here I am assuming that after the merge() step the first three columns are the probeset ID, gene_id, symbol, and the remaining columns are the expression values. You will lose all data for which there isn't an Entrez Gene ID, but the same is true of the MBNI method I outline above. Best, Jim > > > Thanks in advance for your help. > > Best wishes Lesley _______________________________________________ > Bioconductor mailing list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.5 years ago Hoyles, Lesley ▴ 50

0

Entering edit mode

Hi Lesley, On 10/3/2012 2:29 PM, Hoyles, Lesley wrote: > Hi Jim > > Thanks, the reannotation worked a treat. I've been able to export the normalized data in annotated format. > > I am adverse to removing probes that have no Entrez ID associated with them as I want to put the whole set of data through limma. I can't use the annotated expr.loess in lmFit, but is there a way I can get the symbol information into the output of lmFit (for instance, as fit$symbol)? There is a 'genes' slot to an MArrayLM object (the output from e.g., lmFit) into which you can stuff a data.frame containing gene symbols, etc. Another option is to use the annaffy package to do the annotation. And if you are going to use annaffy and limma, then I should make a shameless plug for the affycoretools package, which contains a function designed to go from an MArrayLM object to annotated output in a single function call (outputting HTML or text files). Best, Jim > > Best wishes > Lesley > . > > ________________________________________ > From: James W. MacDonald [jmacdon at uw.edu] > Sent: 03 October 2012 16:30 > To: Hoyles, Lesley > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Assigning gene symbols to Affymetrix data and averaging probes > > Hi Lesley, > > On 10/3/2012 10:55 AM, Hoyles, Lesley wrote: >> Hi >> >> I have processed my affy data and am able to annotate the object >> mice.loess using the following. ID<- featureNames(mice.loess) Symbol >> <- getSYMBOL(ID,'mouse4302.db') fData(mice.loess)<- >> data.frame(ID=ID,Symbol=Symbol) >> >> >> However, when I convert my object as follows - expr.loess<- >> exprs(mice.loess) - I lose the annotation and have been unable to >> find a way to annotate expr.loess. Please could anybody suggest how I >> can annotate expr.loess? > expr.loess<- data.frame(ID = ID, Symbol = Symbol, exprs(mice.loess)) > >> >> Is there a way of averaging probes for each gene with Affymetrix >> data? I've been able to do this with single-channel Agilent data >> using the example given in the limma guide. > There are probably two reasonable ways to do this. First, the easiest. > > dat<- ReadAffy(cdfname = "mouse4302mmentrezcdf") > > and proceed from there. This will use the MBNI re-mapped CDF package > based on Entrez Gene IDs, and you will have a single value per gene > after summarization. There are other ways to map the probes; see > http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/C DF_download.asp > at the bottom of the page for more info. > > Alternatively if you want to stick with the original probesets, the > problem arises that some probesets are not well annotated, so what to do > with those? In addition, gene symbols are not guaranteed to be unique, > so you can't just assume that they are. Entrez Gene and UniGene IDs are > supposed to be unique, so you could go with them, doing something like > (untested) > > gns<- toTable(mouse4302ENTREZID) > alldat<- merge(gns, expr.loess, by = 1) ## where expr.loess is the > data.frame I suggest above > alldatlst<- tapply(1:nrow(alldat), alldat$gene_id, function(x) alldat[x,]) > combined.data<- do.call("rbind", lapply(alldatlst, function(x) > c(x[1,1:3], colMeans(x[,-c(1:3)]))) > > Here I am assuming that after the merge() step the first three columns > are the probeset ID, gene_id, symbol, and the remaining columns are the > expression values. You will lose all data for which there isn't an > Entrez Gene ID, but the same is true of the MBNI method I outline above. > > Best, > > Jim > > >> >> Thanks in advance for your help. >> >> Best wishes Lesley _______________________________________________ >> Bioconductor mailing list Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.5 years ago James W. MacDonald 65k

Login before adding your answer.