raw probe set expression data and masking questions
3
0
Entering edit mode
@schwartz-donald-1853
Last seen 10.3 years ago
Anyone know a simple method to harvest the raw probe set level expression data (not individual PM probe or PM-MM probe pair level data), which will be summarized by average difference, and then export that into excel with the Affy ID or "probe set name" as row headers and columns representing samples? Second question, I have a small affy custom array. I want to select certain probe sets for analysis prior to any low-level processing (i.e., prior to normalization and background correction, etc.). In other words, I want to mask certain probe sets before normalization and background correction. Following masking, the probe sets will be ignored for the entire analysis. The difficulty is that there is no consistent naming feature across all probe sets so the only way to identify probe sets to mask or to analyze is with a character vector. In addition, I need the flexibility to mask different sets of probe sets and then process (normalize, bg.correct, etc) the data various ways. I'll use: expresso(affybatch, widget=TRUE) to do this post masking. Thanks for your suggestions. Don Donald R. Schwartz, PhD Research Scientist Karmanos Cancer Institute Wayne State University School of Medicine 6137 Scott Hall 540 E. Canfield Detroit, Michigan 48201 (313)-577-8743 office (313)-577-6739 fax ------ Confidentiality Notice: This email message, including any at...{{dropped}}
Normalization Cancer probe affy PROcess Normalization Cancer probe affy PROcess • 2.2k views
ADD COMMENT
0
Entering edit mode
Jenny Drnevich ★ 2.2k
@jenny-drnevich-382
Last seen 10.3 years ago
Hi Donald, At 07:42 AM 9/8/2006, Schwartz, Donald wrote: >Anyone know a simple method to harvest the raw probe set level expression >data (not individual PM probe or PM-MM probe pair level data), which will >be summarized by average difference, and then export that into excel with >the Affy ID or "probe set name" as row headers and columns representing >samples? I don't understand what you mean by 'raw probe set level expression'? The only raw values are the individual PM and MM probe values. There are various algorithms to combine them into a probe set level expression value, but by definition these are 'processed values', not 'raw' values. > Second question, I have a small affy custom array. I want to select > certain probe sets for analysis prior to any low-level processing (i.e., > prior to normalization and background correction, etc.). In other words, > I want to mask certain probe sets before normalization and background > correction. Following masking, the probe sets will be ignored for the > entire analysis. The difficulty is that there is no consistent naming > feature across all probe sets so the only way to identify probe sets to > mask or to analyze is with a character vector. In addition, I need the > flexibility to mask different sets of probe sets and then process > (normalize, bg.correct, etc) the data various ways. I'll use: > expresso(affybatch, widget=TRUE) to do this post masking. > >Thanks for your suggestions. >Don Below is some code to remove either individual probes or entire probesets. It was originally written by Ariel Chernomoretz and posted to the mailing list, and I have subsequently modified it slightly. I've tried to give explanations throughout - perhaps at some point I'll have time to make it into a small package. There's a chance it might not work with a custom affy array, as it assumes you have a cdf and a probe package for your chip named in the usual way: 'chipnamecdf' and 'chipnameprobe'. If you only want to remove entire probe sets, it might work even if you don't have a probe package for your custom array. Good luck, Jenny ### The first part is just creating two ojects (ResetEnvir and RemoveProbes) originally ### written by Ariel Chernomoretz and modified by Jenny Drnevich to remove individual ### probes and/or entire probesets. Just highlight everything from here until ### you see STOP and paste it to R all at once ResetEnvir<-function(cleancdf){ cdfpackagename <- paste(cleancdf,"cdf",sep="") probepackagename <- paste(cleancdf,"probe",sep="") ll<-search() cdfpackagepos <- grep(cdfpackagename,ll) if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos) ll<-search() probepackagepos <- grep(probepackagename,ll) if(length(probepackagepos)>0) detach(pos=probepackagepos) require(cdfpackagename,character.only=T) require(probepackagename,character.only=T) require(affy) } RemoveProbes<-function(listOutProbes=NULL, listOutProbeSets=NULL, cleancdf,destructive=TRUE){ #default probe dataset values cdfpackagename <- paste(cleancdf,"cdf",sep="") probepackagename <- paste(cleancdf,"probe",sep="") require(cdfpackagename,character.only = TRUE) require(probepackagename,character.only = TRUE) probe.env.orig <- get(probepackagename) if(!is.null(listOutProbes)){ # taking probes out from CDF env probes<- unlist(lapply(listOutProbes,function(x){ a<-strsplit(x,"at") aux1<-paste(a[[1]][1],"at",sep="") aux2<-as.integer(a[[1]][2]) c(aux1,aux2) })) n1<-as.character(probes[seq(1,(length(probes)/2))*2-1]) n2<-as.integer(probes[seq(1,(length(probes)/2))*2]) probes<-data.frame(I(n1),n2) probes[,1]<-as.character(probes[,1]) probes[,2]<-as.integer(probes[,2]) pset<-unique(probes[,1]) for(i in seq(along=pset)){ ii <-grep(pset[i],probes[,1]) iout<-probes[ii,2] a<-get(pset[i],env=get(cdfpackagename)) a<-a[-iout,] assign(pset[i],a,env=get(cdfpackagename)) } } # taking probesets out from CDF env if(!is.null(listOutProbeSets)){ rm(list=listOutProbeSets,envir=get(cdfpackagename)) } # setting the PROBE env accordingly (idea from gcrma compute.affinities.R) tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) newAB <- new("AffyBatch",cdfName=cleancdf) pmIndex <- unlist(indexProbes(newAB,"pm")) subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex) rm(newAB) iNA <- whichis.na(subIndex)) if(length(iNA)>0){ ipos<-grep(probepackagename,search()) assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) } } ### STOP HERE!!!! PASTE THE ABOVE INTO R AND CHECK TO SEE YOU HAVE THE TWO OBJECTS ### (ResetEnvir and RemoveProbes) IN YOUR WORKSPACE WITH ls() # All you need now is your affybatch object, and a character vector of probe set names # and/or another vector of individual probes that you want to remove. If your affybatch # object is called 'rawdata' and the vector of probesets is 'maskedprobes', follow # these steps: cleancdf <- cleancdfname(rawdata at cdfName,addcdf=FALSE) # Make sure you are starting with the original cdf with all the probes and probesets. ResetEnvir(cleancdf) # Double-check to make sure all probesets are present in your affybatch by typing in # the name of your affybatch and looking at the output. rawdata # To remove some probe sets (but not individual probes in this example), use: RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobes, cleancdf) # The cdf file will be temporarily modified to mask the indicated probesets & probes, # which you can check by typing in the name of your affybatch again and seeing that # the number of probesets have decreased. The masking can be undone by using ResetEnvir # as above, or by quitting the session. However, any Expression Set objects created # when the cdf is modified will have the masked probesets removed permanently because # they do not refer to the cdf like an affybatch object does. Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu
ADD COMMENT
0
Entering edit mode
Jenny Drnevich ★ 2.2k
@jenny-drnevich-382
Last seen 10.3 years ago
Hi Don, I understand now - you're talking about "non-normalized" data; "raw" to me (& most people, I except) means untouched in any way. If you're using expresso() to do the pre-processing, you can set normalize=FALSE to skip any normalization, but do whatever background correction and summarization you want. I did just notice that if you use expresso with widget=TRUE to get the pop-up boxes, there's NOT an option to turn normalization off ; Rafa - is this something that should be changed? BTW - almost everyone, including Affymetrix, agrees that the average of the PM-MM values (even using modified MM values, such as in the mas5 algorithm) is not the best summary measure because individual probes do differ in their binding efficiencies, such that they are not expected to have the same fluorescence level from the same transcript level. Jenny At 05:25 PM 9/8/2006, you wrote: >By "raw" I was referring to expression summaries (such as average >difference) for a probe set (not individual probes) where the data have >not been scaled or normalized or adjusted in any way. So for each probe >set on each chip, I want the average of the PM-MM values that are in the >.CEL file prior to any chip to chip adjustments. Is that more clear? > >And thanks for the code on probe or probeset removal. > >Cheers, >Don > >________________________________ > >From: Jenny Drnevich [mailto:drnevich at uiuc.edu] >Sent: Fri 9/8/2006 11:01 AM >To: Schwartz, Donald; bioconductor at stat.math.ethz.ch >Subject: Re: [BioC] raw probe set expression data and masking questions > > > >Hi Donald, > >At 07:42 AM 9/8/2006, Schwartz, Donald wrote: > >Anyone know a simple method to harvest the raw probe set level expression > >data (not individual PM probe or PM-MM probe pair level data), which will > >be summarized by average difference, and then export that into excel with > >the Affy ID or "probe set name" as row headers and columns representing > >samples? > >I don't understand what you mean by 'raw probe set level expression'? The >only raw values are the individual PM and MM probe values. There are >various algorithms to combine them into a probe set level expression value, >but by definition these are 'processed values', not 'raw' values. > > > Second question, I have a small affy custom array. I want to select > > certain probe sets for analysis prior to any low-level processing (i.e., > > prior to normalization and background correction, etc.). In other words, > > I want to mask certain probe sets before normalization and background > > correction. Following masking, the probe sets will be ignored for the > > entire analysis. The difficulty is that there is no consistent naming > > feature across all probe sets so the only way to identify probe sets to > > mask or to analyze is with a character vector. In addition, I need the > > flexibility to mask different sets of probe sets and then process > > (normalize, bg.correct, etc) the data various ways. I'll use: > > expresso(affybatch, widget=TRUE) to do this post masking. > > > >Thanks for your suggestions. > >Don > >Below is some code to remove either individual probes or entire probesets. >It was originally written by Ariel Chernomoretz and posted to the mailing >list, and I have subsequently modified it slightly. I've tried to give >explanations throughout - perhaps at some point I'll have time to make it >into a small package. There's a chance it might not work with a custom affy >array, as it assumes you have a cdf and a probe package for your chip named >in the usual way: 'chipnamecdf' and 'chipnameprobe'. If you only want to >remove entire probe sets, it might work even if you don't have a probe >package for your custom array. > >Good luck, >Jenny > > >### The first part is just creating two ojects (ResetEnvir and >RemoveProbes) originally >### written by Ariel Chernomoretz and modified by Jenny Drnevich to remove >individual >### probes and/or entire probesets. Just highlight everything from here until >### you see STOP and paste it to R all at once > >ResetEnvir<-function(cleancdf){ > cdfpackagename <- paste(cleancdf,"cdf",sep="") > probepackagename <- paste(cleancdf,"probe",sep="") > ll<-search() > cdfpackagepos <- grep(cdfpackagename,ll) > if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos) > ll<-search() > probepackagepos <- grep(probepackagename,ll) > if(length(probepackagepos)>0) detach(pos=probepackagepos) > require(cdfpackagename,character.only=T) > require(probepackagename,character.only=T) > require(affy) >} > >RemoveProbes<-function(listOutProbes=NULL, > listOutProbeSets=NULL, > >cleancdf,destructive=TRUE){ > > > #default probe dataset values > cdfpackagename <- paste(cleancdf,"cdf",sep="") > probepackagename <- paste(cleancdf,"probe",sep="") > require(cdfpackagename,character.only = TRUE) > require(probepackagename,character.only = TRUE) > probe.env.orig <- get(probepackagename) > > > if(!is.null(listOutProbes)){ > # taking probes out from CDF env > probes<- unlist(lapply(listOutProbes,function(x){ > a<-strsplit(x,"at") > aux1<-paste(a[[1]][1],"at",sep="") > aux2<-as.integer(a[[1]][2]) > c(aux1,aux2) > })) > n1<-as.character(probes[seq(1,(length(probes)/2))*2-1]) > n2<-as.integer(probes[seq(1,(length(probes)/2))*2]) > probes<-data.frame(I(n1),n2) > probes[,1]<-as.character(probes[,1]) > probes[,2]<-as.integer(probes[,2]) > pset<-unique(probes[,1]) > for(i in seq(along=pset)){ > ii <-grep(pset[i],probes[,1]) > iout<-probes[ii,2] > a<-get(pset[i],env=get(cdfpackagename)) > a<-a[-iout,] > assign(pset[i],a,env=get(cdfpackagename)) > } > } > > > # taking probesets out from CDF env > if(!is.null(listOutProbeSets)){ > rm(list=listOutProbeSets,envir=get(cdfpackagename)) > } > > > # setting the PROBE env accordingly (idea from gcrma compute.affinities.R) > tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) > newAB <- new("AffyBatch",cdfName=cleancdf) > pmIndex <- unlist(indexProbes(newAB,"pm")) > subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex) > rm(newAB) > iNA <- whichis.na(subIndex)) > > > if(length(iNA)>0){ > ipos<-grep(probepackagename,search()) > assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) > } >} > >### STOP HERE!!!! PASTE THE ABOVE INTO R AND CHECK TO SEE YOU HAVE THE TWO >OBJECTS >### (ResetEnvir and RemoveProbes) IN YOUR WORKSPACE WITH ls() > ># All you need now is your affybatch object, and a character vector of >probe set names ># and/or another vector of individual probes that you want to remove. If >your affybatch ># object is called 'rawdata' and the vector of probesets is 'maskedprobes', >follow ># these steps: > >cleancdf <- cleancdfname(rawdata at cdfName,addcdf=FALSE) > ># Make sure you are starting with the original cdf with all the probes and >probesets. > >ResetEnvir(cleancdf) > ># Double-check to make sure all probesets are present in your affybatch by >typing in ># the name of your affybatch and looking at the output. > >rawdata > ># To remove some probe sets (but not individual probes in this example), use: >RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobes, cleancdf) > ># The cdf file will be temporarily modified to mask the indicated probesets >& probes, ># which you can check by typing in the name of your affybatch again and >seeing that ># the number of probesets have decreased. The masking can be undone by >using ResetEnvir ># as above, or by quitting the session. However, any Expression Set objects >created ># when the cdf is modified will have the masked probesets removed >permanently because ># they do not refer to the cdf like an affybatch object does. > > >Jenny Drnevich, Ph.D. > >Functional Genomics Bioinformatics Specialist >W.M. Keck Center for Comparative and Functional Genomics >Roy J. Carver Biotechnology Center >University of Illinois, Urbana-Champaign > >330 ERML >1201 W. Gregory Dr. >Urbana, IL 61801 >USA > >ph: 217-244-7355 >fax: 217-265-5066 >e-mail: drnevich at uiuc.edu > > > > > > >------ >Confidentiality Notice: This email message, including any attachments, is >for the sole use of the intended recipient(s) and may contain confidential >and/or privileged information. If you are not the intended recipient(s), >you are hereby notified that any dissemination, unauthorized review, use, >disclosure, copying, or distribution of the contents of this email and/or >any materials contained in any attachments is prohibited. If you have >received this message in error, or are not the intended recipient(s), >please immediately notify the sender by email and destroy all copies of >the original message, including attachments. Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu
ADD COMMENT
0
Entering edit mode
jacorvar ▴ 40
@jacorvar-8972
Last seen 14 months ago
European Union

Hi,

 

Is there any possibility to exclude some probes from a HTAFeatureSet instead of an AffyBatch object? There are no CDF files for the corresponding HTAFeatureSet libraries...

 

Thanks

ADD COMMENT
0
Entering edit mode

I'm not sure, as I haven't worked with Affy data in a very long time. Looking at the oligo package, if there was just a single set of probes you know you wanted to remove, you could potentially build your own platform design information package with pdInfoBuilder http://bioconductor.org/packages/release/bioc/html/pdInfoBuilder.html. That wouldn't work as well if you wanted to do it "on the fly" with different probe sets each time. Maybe Benilton will have a better answer for you...

Jenny

ADD REPLY
0
Entering edit mode

Hi Jenny,

 

Actually it does not work when using pdInfoBuilder. I removed the probes from the PGF and the CLF files and then built the library.  There are some probesets and transcript clusters (~2% and 0.5 % of the total, respectively) whose probes (all of them) have been removed, but their probes still appear in the raw data when I read the CEL files. Maybe because I did not remove them (probesets and transcript clusters) from the MPS file?

Thanks

ADD REPLY
0
Entering edit mode
When you read in your cell files with read.celfiles(), did you set pkgname = to your new package name? Otherwise, it will pull in the regular package which could explain why you are still seeing them in the data. Jenny From: jacorvar [bioc] [mailto:noreply@bioconductor.org] Sent: Tuesday, March 28, 2017 9:03 AM To: Zadeh, Jenny Drnevich <drnevich@illinois.edu> Subject: [bioc] C: raw probe set expression data and masking questions Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User jacorvar<https: support.bioconductor.org="" u="" 8972=""/> wrote Comment: raw probe set expression data and masking questions<https: support.bioconductor.org="" p="" 14228="" #94268="">: Hi Jenny, Actually it does not work when using pdInfoBuilder. I removed the probes from the PGF and the CLF files and then built the library. There are some probesets and transcript clusters (~2% and 0.5 % of the total, respectively) whose probes (all of them) have been removed, but their probes still appear in the raw data when I read the CEL files. Maybe because I did not remove them (probesets and transcript clusters) from the MPS file? Thanks ________________________________ Post tags: , Normalization, Cancer, probe, affy, PROcess, Normalization, Cancer, probe, affy, PROcess You may reply via email or visit C: raw probe set expression data and masking questions
ADD REPLY
0
Entering edit mode

First, I removed the original packge before installing the new one, so it must be something else.
 

ADD REPLY
0
Entering edit mode

Would it be reasonable to set the intensity values of the probes I want to remove to the minimum of the array?

ADD REPLY

Login before adding your answer.

Traffic: 541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6