Hi Jack,
There's nothing in the code that's specific for 11 probes in the set.
I'm not sure what the problem could be - I don't know the code
intimately, I only hacked it, and I barely ever use it. Have you
tried doing debug(RemoveProbes)? If you're not familiar with it, it
will send your next call to RemoveProbes to a debugger where you can
step through each line of code to see what it's doing and where the
problem is. I use it all the time for hacking. If you can't figure it
out, send me (off list) your list of probes and probe sets, and your
ENTIRE code that you've been trying (not just error messages!) and I
might have time this weekend to look into it.
Jenny
At 02:03 PM 9/2/2010, Jack Luo wrote:
>Jenny,
>
>Thanks for your email. I'll look to see whether I can do this. But
>after running the code using my own data, I am puzzled whether my
>speculation in the previous email is correct or not, since I also
>run into problems even when not all the probes are taken out in the
>probe sets. One quick question is that does your code have any
>special treatment on those probe sets that have more than 11 probes?
>We know that in U133A, majority of the probe sets have 11 probe
>pairs and there are ~ 500 probe sets have more than 11 probe pairs.
>I run the new code against two of my datasets (one ran through
>perfectly using the 08 version of the code and the other one ran
>into problem using the 08 version of the code) and found that the
>new code run into errors for both the datasets. The error msg shows
>
>Error in RemoveProbes(listOutProbes = maskedprobes, listOutProbeSets
>= maskedprobeSets, :
> Probe set 320_at has zero probes left. Remove this entire probe
set
> in the listOutProbeSets argument
>
>The other one shows
>Error in RemoveProbes(listOutProbes = maskedprobes, listOutProbeSets
>= maskedprobeSets, :
> Probe set 632_at has zero probes left. Remove this entire probe
set
> in the listOutProbeSets argument
>
>The common thing about "320_at" and "632_at" are that they both have
>16 probe pairs before masking. Actually the maskedprobes do not
>contain all the probe pairs for the above probe sets.
>For example, for the "320_at", there are still 5 probe pairs left.
>
> > tmpS <- unlist(lapply(maskedprobes,function(x){
> a<-strsplit(x,"at")
> aux1<-paste(a[[1]][1],"at",sep="")
>
> aux1
> }))
>
> > head(maskedprobes)
>[1] "1007_s_at3"
>"1053_at1" "1053_at2" "1053_at4" "1053_at5" "1053_at6"
> > head(tmpS)
>[1] "1007_s_at" "1053_at" "1053_at" "1053_at" "1053_at"
"1053_at"
>
> > sum(tmpS == "320_at")
>[1] 5
>
>Thanks again,
>
>-Jack
>
>
>On Thu, Sep 2, 2010 at 12:37 PM, Jenny Drnevich
><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote:
>Hi Jack,
>
>You're right, the error message can be interpreted exactly the
>opposite of what it's supposed to mean! Is this better? "Remove this
>entire probe set by adding it to the listOutProbeSets argument"
>Ideally, you would also remove the individual probes from the
>listOutProbes argument, but I'm not sure it's necessary. If you have
>a lot of these potentially in your list, it would be a pain to do. I
>don't have time, but you might try to write a function to examine
>listOutProbes, see which probe sets have all probes removed, and
>then output these probe sets so you can add them to your
>listOutProbeSets. All the necessary code to do this should be in
>RemoveProbes, if you feel comfortable doing the hacking. Please send
>it to me if you do.
>
>Jenny
>
>
>At 10:40 AM 9/2/2010, Jack Luo wrote:
>>Jenny,
>>
>>I played around with the new code and found that the only way to
>>make it work (when there are probesets whose probes are all taken
>>out) is to INCLUDE the probeset name in the listOutProbeSets
>>arguments, not exclude them (the error msg is confusing), please see
below.
>>
>> > maskedprobes
>> [1]
>> "222152_at1" "222152_at2" "222152_at3" "222152_at4"
>> "222152_at5" "222152_at6"
>> [7]
>> "222152_at7" "222152_at8" "222152_at9" "222152_at10"
>> "222152_at11" "204659_s_at1"
>>[13] "204659_s_at2" "204659_s_at3" "204659_s_at4"
>> > maskedprobeSets
>>[1] "210893_at"
>>
>> >RemoveProbes(listOutProbes=maskedprobes,
listOutProbeSets=maskedprobeSets,
>>+ cleancdf = "hgu133a")
>>Error in RemoveProbes(listOutProbes = maskedprobes,
>>listOutProbeSets = maskedprobeSets, :
>> Probe set 222152_at has zero probes left. Remove this entire
probe set
>> in the listOutProbeSets argument
>>
>>However, it the maskedprobeSets <- 222152_at, it runs through
>>without problem. I'll run on my data and let you know later, since
>>it runs quite a while. But if what I mentioned above is the key, it
>>should work.
>>
>>Thanks,
>>
>>-Jack
>>
>>
>>
>>
>>
>>
>>
>>On Thu, Sep 2, 2010 at 10:09 AM, Jenny Drnevich
>><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote:
>>Hi Jack,
>>I found I had an even more updated version from Jan 2009. The code
>>is below. Try using this, and let me know if you still have the
>>problem. Note that I changed the arguments to RemoveProbes(); there
>>is only listOutProbes, listOutProbesSets and cleancdf. cleancdf is
>>the name array, without "cdf" or "probe" at the end. For example,
>>if you're using the mouse 430 2.0 array, you'd need cleancdf =
>>"mouse4302". If you're still having problems, please send the
>>entire output of your code and sessionInfo().
>>Jenny
>>### The first part is just creating two ojects (ResetEnvir and
>>RemoveProbes) originally
>>### written by Ariel Chernomoretz and modified by Jenny Drnevich to
>>remove individual
>>### probes and/or entire probesets. Just highlight everything from
>>here until
>>### you see STOP and paste it to R all at once
>>### Updated Jan 2009
>>
>>ResetEnvir<-function(cleancdf){
>> cdfpackagename <- paste(cleancdf,"cdf",sep="")
>> probepackagename <- paste(cleancdf,"probe",sep="")
>> ll<-search()
>> cdfpackagepos <- grep(cdfpackagename,ll)
>> if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos)
>> ll<-search()
>> probepackagepos <- grep(probepackagename,ll)
>> if(length(probepackagepos)>0) detach(pos=probepackagepos)
>> require(cdfpackagename,character.only=T)
>> require(probepackagename,character.only=T)
>> require(affy)
>>}
>>
>>RemoveProbes<-function(listOutProbes=NULL,
>> listOutProbeSets=NULL,
>>
>>cleancdf,destructive=TRUE){
>>
>>
>> #default probe dataset values
>> cdfpackagename <- paste(cleancdf,"cdf",sep="")
>> probepackagename <- paste(cleancdf,"probe",sep="")
>> require(cdfpackagename,character.only = TRUE)
>> require(probepackagename,character.only = TRUE)
>> probe.env.orig <- get(probepackagename)
>>
>>
>> if(!is.null(listOutProbes)){
>> # taking probes out from CDF env
>> probes<- unlist(lapply(listOutProbes,function(x){
>> a<-strsplit(x,"at")
>> aux1<-paste(a[[1]][1],"at",sep="")
>> aux2<-as.integer(a[[1]][2])
>> c(aux1,aux2)
>> }))
>> n1<-as.character(probes[seq(1,(length(probes)/2))*2-1])
>> n2<-as.integer(probes[seq(1,(length(probes)/2))*2])
>> probes<-data.frame(I(n1),n2)
>> probes[,1]<-as.character(probes[,1])
>> probes[,2]<-as.integer(probes[,2])
>> pset<-unique(probes[,1])
>> if (!is.null(listOutProbeSets))
>> pset <- setdiff(pset,listOutProbeSets)
>> for(i in seq(along=pset)){
>> ii <-grep(pset[i],probes[,1])
>> iout<-probes[ii,2]
>> a<-get(pset[i],env=get(cdfpackagename))
>> a<-a[-iout,]
>> if(!is.matrix(a)) a <- matrix(a,ncol=2,byrow=T)
>> if(nrow(a)==0)
>> stop("Probe set ",pset[i]," has zero probes left. Remove this
>> entire probe set
>> in the listOutProbeSets argument")
>> assign(pset[i],a,env=get(cdfpackagename))
>> }
>> }
>>
>>
>> # taking probesets out from CDF env
>> if(!is.null(listOutProbeSets)){
>> rm(list=listOutProbeSets,envir=get(cdfpackagename))
>> }
>>
>>
>> # setting the PROBE env accordingly (idea from gcrma
compute.affinities.R)
>> tmp <- get("xy2indices",paste("package:",cdfpackagename,sep=""))
>> newAB <- new("AffyBatch",cdfName=cleancdf)
>> pmIndex <- unlist(indexProbes(newAB,"pm"))
>> subIndex<-
>> match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmI
ndex)
>> rm(newAB)
>> iNA <- which(<http: is.na="">is.na(subIndex))
>>
>>
>> if(length(iNA)>0){
>> ipos<-grep(probepackagename,search())
>> assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
>> }
>>}
>>
>>At 08:41 AM 9/2/2010, Jack Luo wrote:
>>>Jenny,
>>>Thanks for your email. The reason you provided makes perfect
>>>sense. However, I've looked into this and found that it may not be
>>>the root cause of the error. I tried just masking 25 probes and 1
>>>probe set (U133A array) as below:
>>> >maskedprobes
>>> [1]
>>> "222152_at1" "222152_at2" "222152_at3" "222152_at4"
>>> "222152_at5" "222152_at6"
>>> [7]
>>> "222152_at7" "222152_at8" "222152_at9" "222152_at10"
>>> "222152_at11" "204659_s_at1"
>>>[13]
>>>"204659_s_at2" "204659_s_at3" "204659_s_at4" "204659_s_at5"
>>>"204659_s_at6" "204659_s_at7"
>>>[19] "204659_s_at8" "204659_s_at9" "204659_s_at10"
>>>"204659_s_at11" "210893_at1" "210893_at2"
>>>[25] "210893_at3"
>>> > maskedprobeSets
>>>[1] "222152_at"
>>>Presumably it should run into error since all the probes within
>>>the probe set have been taken out.
>>>However, the code still runs through and the dimension works out
correctly.
>>>A <- ReadAffy()
>>>B <- justMAS(A,tgt = 500,scale = TRUE)
>>> > dim(pm(A))
>>>[1] 247940 54 # the dimension without masking is 247965,
>>>which is equal to 247940 + 25
>>> > dim(B)
>>>Features Samples
>>> 22281 54 # the dimension without masking is 22283,
>>> which is equal to 22281 + 2 (the codes knows that all the probes
>>> in 204659_s_at have been taken out, so the total # of masked probe
sets is 2).
>>>In addition, for one of the dataset which the code does not run
>>>into problem, I found that there are ~ 4000 probe sets having this
>>>kind of problem: all the probes within the probe sets have been
>>>taken out. Yet it still runs through and the dimension works out
correctly.
>>>Sorry for bothering you so much on this, I really appreciate it.
>>>Best,
>>>-Jack
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Wed, Sep 1, 2010 at 4:57 PM, Jenny Drnevich
>>><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote:
>>>Hi Jack,
>>>There shouldn't be a limit on the number of probes/probesets that
>>>you can remove. However, based on the error message, my guess is
>>>that there's some redundancy between your list of probe sets to
>>>remove and probes to remove. Probably you have all the probes of a
>>>probe set in your probe list, and that probe set in your probe set
>>>list as well. The code first removes all the probes, so when it
>>>goes to remove that probe set, it won't find anything and could
>>>throw an error. I don't know for sure that this is your problem,
>>>but I've seen it before.
>>>Jenny
>>>At 02:54 PM 9/1/2010, Jack Luo wrote:
>>>>Hi Jenny,
>>>>I run the code through 3 different datasets (each consists of
>>>>50-60 U133a cel files), the first 2 datasets run pretty smoothly,
>>>>the last one has the error msg:
>>>>Error in ans[[i]][, i.probes] : incorrect number of dimensions
>>>>Not sure what's causing the problem, but the last one has the
>>>>largest number of masked probes/probesets: ~179k masked probes
>>>>(out of 247965 in total) and ~15k masked probe sets (out of 22283
>>>>in total). Does the code have a limit on the number of masked
>>>>probe/probesets? or there are certain probes/probe sets that can
>>>>not be filtered?
>>>>Thanks a lot,
>>>>-Jack
>>>>On Tue, Aug 31, 2010 at 5:57 PM, Jenny Drnevich
>>>><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote:
>>>>Hi Jack,
>>>>It's best to also post your question to the BioC list. First of
>>>>all, you link was to the September 2006 code, but there was an
>>>>update to the code in September 2008 (I think it still should
work):
>>>>
>>>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2008-september="" 024296="" .html="">
https://stat.ethz.ch/pipermail/bioconductor/2008-September/02429
6.html
>>>>
>>>>Did you see this more recent post on the BioC list about removing
>>>>individual probes? The individual probe names have to be in the
>>>>correct format, and your's aren't:
>>>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2010-january="" 031463.h="" tml="">
https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.ht
ml
>>>>
>>>>HTH,
>>>>Jenny
>>>>At 11:46 AM 8/31/2010, you wrote:
>>>>>Hi Jenny,
>>>>>I am a BioC user and find the following link:
>>>>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2006-september="" 01424="" 2.html="">
https://stat.ethz.ch/pipermail/bioconductor/2006-September/0142
42.html
>>>>>
>>>>>which is related to removing probes/probe sets from the cdf.
>>>>>It's highly related to my work, which needs to perform
>>>>>normalization after filtering out some probe pairs whose PM is
>>>>>not significantly higher than MM.
>>>>>I played around with the code in the above link by following the
>>>>>instruction, it works perfectly when some given probe sets need
>>>>>to be taken out, but have some problem when some probes (not
>>>>>whole probe sets) need to be taken out. I am wondering do you
>>>>>have an example for taking some probes as well as probe sets? I
>>>>>have not figured out what's the correct input for probes.
>>>>>I tried the following inputs and neither of them works.
>>>>>maskedprobeSets = rownames(exprs(justRMA(filenames =
>>>>>celfiles)))[1:100] # this format works
>>>>>maskedprobes = rownames(pm(AffyBatch))[1:500] # doesn't work
>>>>>maskedprobes = as.character(1:500) # doesn't work
>>>>>RemoveProbes(listOutProbes=maskedprobes,
>>>>>listOutProbeSets=maskedprobeSets, cleancdf)
>>>>>Thanks a lot! Your help will be highly appreciated.
>>>>>-Jack
[[alternative HTML version deleted]]