Hi Don,
I understand now - you're talking about "non-normalized" data; "raw"
to me
(& most people, I except) means untouched in any way. If you're using
expresso() to do the pre-processing, you can set normalize=FALSE to
skip
any normalization, but do whatever background correction and
summarization
you want. I did just notice that if you use expresso with widget=TRUE
to
get the pop-up boxes, there's NOT an option to turn normalization off
;
Rafa - is this something that should be changed?
BTW - almost everyone, including Affymetrix, agrees that the average
of the
PM-MM values (even using modified MM values, such as in the mas5
algorithm)
is not the best summary measure because individual probes do differ in
their binding efficiencies, such that they are not expected to have
the
same fluorescence level from the same transcript level.
Jenny
At 05:25 PM 9/8/2006, you wrote:
>By "raw" I was referring to expression summaries (such as average
>difference) for a probe set (not individual probes) where the data
have
>not been scaled or normalized or adjusted in any way. So for each
probe
>set on each chip, I want the average of the PM-MM values that are in
the
>.CEL file prior to any chip to chip adjustments. Is that more clear?
>
>And thanks for the code on probe or probeset removal.
>
>Cheers,
>Don
>
>________________________________
>
>From: Jenny Drnevich [mailto:drnevich at uiuc.edu]
>Sent: Fri 9/8/2006 11:01 AM
>To: Schwartz, Donald; bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] raw probe set expression data and masking
questions
>
>
>
>Hi Donald,
>
>At 07:42 AM 9/8/2006, Schwartz, Donald wrote:
> >Anyone know a simple method to harvest the raw probe set level
expression
> >data (not individual PM probe or PM-MM probe pair level data),
which will
> >be summarized by average difference, and then export that into
excel with
> >the Affy ID or "probe set name" as row headers and columns
representing
> >samples?
>
>I don't understand what you mean by 'raw probe set level expression'?
The
>only raw values are the individual PM and MM probe values. There are
>various algorithms to combine them into a probe set level expression
value,
>but by definition these are 'processed values', not 'raw' values.
>
> > Second question, I have a small affy custom array. I want to
select
> > certain probe sets for analysis prior to any low-level processing
(i.e.,
> > prior to normalization and background correction, etc.). In other
words,
> > I want to mask certain probe sets before normalization and
background
> > correction. Following masking, the probe sets will be ignored for
the
> > entire analysis. The difficulty is that there is no consistent
naming
> > feature across all probe sets so the only way to identify probe
sets to
> > mask or to analyze is with a character vector. In addition, I need
the
> > flexibility to mask different sets of probe sets and then process
> > (normalize, bg.correct, etc) the data various ways. I'll use:
> > expresso(affybatch, widget=TRUE) to do this post masking.
> >
> >Thanks for your suggestions.
> >Don
>
>Below is some code to remove either individual probes or entire
probesets.
>It was originally written by Ariel Chernomoretz and posted to the
mailing
>list, and I have subsequently modified it slightly. I've tried to
give
>explanations throughout - perhaps at some point I'll have time to
make it
>into a small package. There's a chance it might not work with a
custom affy
>array, as it assumes you have a cdf and a probe package for your chip
named
>in the usual way: 'chipnamecdf' and 'chipnameprobe'. If you only want
to
>remove entire probe sets, it might work even if you don't have a
probe
>package for your custom array.
>
>Good luck,
>Jenny
>
>
>### The first part is just creating two ojects (ResetEnvir and
>RemoveProbes) originally
>### written by Ariel Chernomoretz and modified by Jenny Drnevich to
remove
>individual
>### probes and/or entire probesets. Just highlight everything from
here until
>### you see STOP and paste it to R all at once
>
>ResetEnvir<-function(cleancdf){
> cdfpackagename <- paste(cleancdf,"cdf",sep="")
> probepackagename <- paste(cleancdf,"probe",sep="")
> ll<-search()
> cdfpackagepos <- grep(cdfpackagename,ll)
> if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos)
> ll<-search()
> probepackagepos <- grep(probepackagename,ll)
> if(length(probepackagepos)>0) detach(pos=probepackagepos)
> require(cdfpackagename,character.only=T)
> require(probepackagename,character.only=T)
> require(affy)
>}
>
>RemoveProbes<-function(listOutProbes=NULL,
> listOutProbeSets=NULL,
>
>cleancdf,destructive=TRUE){
>
>
> #default probe dataset values
> cdfpackagename <- paste(cleancdf,"cdf",sep="")
> probepackagename <- paste(cleancdf,"probe",sep="")
> require(cdfpackagename,character.only = TRUE)
> require(probepackagename,character.only = TRUE)
> probe.env.orig <- get(probepackagename)
>
>
> if(!is.null(listOutProbes)){
> # taking probes out from CDF env
> probes<- unlist(lapply(listOutProbes,function(x){
> a<-strsplit(x,"at")
> aux1<-paste(a[[1]][1],"at",sep="")
> aux2<-as.integer(a[[1]][2])
> c(aux1,aux2)
> }))
> n1<-as.character(probes[seq(1,(length(probes)/2))*2-1])
> n2<-as.integer(probes[seq(1,(length(probes)/2))*2])
> probes<-data.frame(I(n1),n2)
> probes[,1]<-as.character(probes[,1])
> probes[,2]<-as.integer(probes[,2])
> pset<-unique(probes[,1])
> for(i in seq(along=pset)){
> ii <-grep(pset[i],probes[,1])
> iout<-probes[ii,2]
> a<-get(pset[i],env=get(cdfpackagename))
> a<-a[-iout,]
> assign(pset[i],a,env=get(cdfpackagename))
> }
> }
>
>
> # taking probesets out from CDF env
> if(!is.null(listOutProbeSets)){
> rm(list=listOutProbeSets,envir=get(cdfpackagename))
> }
>
>
> # setting the PROBE env accordingly (idea from gcrma
compute.affinities.R)
> tmp <- get("xy2i",paste("package:",cdfpackagename,sep=""))
> newAB <- new("AffyBatch",cdfName=cleancdf)
> pmIndex <- unlist(indexProbes(newAB,"pm"))
> subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex)
> rm(newAB)
> iNA <- whichis.na(subIndex))
>
>
> if(length(iNA)>0){
> ipos<-grep(probepackagename,search())
> assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
> }
>}
>
>### STOP HERE!!!! PASTE THE ABOVE INTO R AND CHECK TO SEE YOU HAVE
THE TWO
>OBJECTS
>### (ResetEnvir and RemoveProbes) IN YOUR WORKSPACE WITH ls()
>
># All you need now is your affybatch object, and a character vector
of
>probe set names
># and/or another vector of individual probes that you want to remove.
If
>your affybatch
># object is called 'rawdata' and the vector of probesets is
'maskedprobes',
>follow
># these steps:
>
>cleancdf <- cleancdfname(rawdata at cdfName,addcdf=FALSE)
>
># Make sure you are starting with the original cdf with all the
probes and
>probesets.
>
>ResetEnvir(cleancdf)
>
># Double-check to make sure all probesets are present in your
affybatch by
>typing in
># the name of your affybatch and looking at the output.
>
>rawdata
>
># To remove some probe sets (but not individual probes in this
example), use:
>RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobes,
cleancdf)
>
># The cdf file will be temporarily modified to mask the indicated
probesets
>& probes,
># which you can check by typing in the name of your affybatch again
and
>seeing that
># the number of probesets have decreased. The masking can be undone
by
>using ResetEnvir
># as above, or by quitting the session. However, any Expression Set
objects
>created
># when the cdf is modified will have the masked probesets removed
>permanently because
># they do not refer to the cdf like an affybatch object does.
>
>
>Jenny Drnevich, Ph.D.
>
>Functional Genomics Bioinformatics Specialist
>W.M. Keck Center for Comparative and Functional Genomics
>Roy J. Carver Biotechnology Center
>University of Illinois, Urbana-Champaign
>
>330 ERML
>1201 W. Gregory Dr.
>Urbana, IL 61801
>USA
>
>ph: 217-244-7355
>fax: 217-265-5066
>e-mail: drnevich at uiuc.edu
>
>
>
>
>
>
>------
>Confidentiality Notice: This email message, including any
attachments, is
>for the sole use of the intended recipient(s) and may contain
confidential
>and/or privileged information. If you are not the intended
recipient(s),
>you are hereby notified that any dissemination, unauthorized review,
use,
>disclosure, copying, or distribution of the contents of this email
and/or
>any materials contained in any attachments is prohibited. If you
have
>received this message in error, or are not the intended recipient(s),
>please immediately notify the sender by email and destroy all copies
of
>the original message, including attachments.
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
I'm not sure, as I haven't worked with Affy data in a very long time. Looking at the oligo package, if there was just a single set of probes you know you wanted to remove, you could potentially build your own platform design information package with pdInfoBuilder http://bioconductor.org/packages/release/bioc/html/pdInfoBuilder.html. That wouldn't work as well if you wanted to do it "on the fly" with different probe sets each time. Maybe Benilton will have a better answer for you...
Jenny
Hi Jenny,
Actually it does not work when using pdInfoBuilder. I removed the probes from the PGF and the CLF files and then built the library. There are some probesets and transcript clusters (~2% and 0.5 % of the total, respectively) whose probes (all of them) have been removed, but their probes still appear in the raw data when I read the CEL files. Maybe because I did not remove them (probesets and transcript clusters) from the MPS file?
Thanks
First, I removed the original packge before installing the new one, so it must be something else.
Would it be reasonable to set the intensity values of the probes I want to remove to the minimum of the array?