question regarding removing probes/probe sets before normalization
4
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 24 days ago
United States
Hi Jack, There shouldn't be a limit on the number of probes/probesets that you can remove. However, based on the error message, my guess is that there's some redundancy between your list of probe sets to remove and probes to remove. Probably you have all the probes of a probe set in your probe list, and that probe set in your probe set list as well. The code first removes all the probes, so when it goes to remove that probe set, it won't find anything and could throw an error. I don't know for sure that this is your problem, but I've seen it before. Jenny At 02:54 PM 9/1/2010, Jack Luo wrote: >Hi Jenny, > >I run the code through 3 different datasets (each consists of 50-60 >U133a cel files), the first 2 datasets run pretty smoothly, the last >one has the error msg: >Error in ans[[i]][, i.probes] : incorrect number of dimensions > >Not sure what's causing the problem, but the last one has the >largest number of masked probes/probesets: ~179k masked probes (out >of 247965 in total) and ~15k masked probe sets (out of 22283 in >total). Does the code have a limit on the number of masked >probe/probesets? or there are certain probes/probe sets that can not >be filtered? > >Thanks a lot, > >-Jack > > >On Tue, Aug 31, 2010 at 5:57 PM, Jenny Drnevich ><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >Hi Jack, > >It's best to also post your question to the BioC list. First of all, >you link was to the September 2006 code, but there was an update to >the code in September 2008 (I think it still should work): > ><https: stat.ethz.ch="" pipermail="" bioconductor="" 2008-september="" 024296.ht="" ml="">https://stat.ethz.ch/pipermail/bioconductor/2008-September/024296.h tml > > > >Did you see this more recent post on the BioC list about removing >individual probes? The individual probe names have to be in the >correct format, and your's aren't: ><https: stat.ethz.ch="" pipermail="" bioconductor="" 2010-january="" 031463.html="">https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.html > > > >HTH, >Jenny > > >At 11:46 AM 8/31/2010, you wrote: >>Hi Jenny, >> >>I am a BioC user and find the following link: >><https: stat.ethz.ch="" pipermail="" bioconductor="" 2006-september="" 014242.h="" tml="">https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242. html >> >> >>which is related to removing probes/probe sets from the cdf. It's >>highly related to my work, which needs to perform normalization >>after filtering out some probe pairs whose PM is not significantly >>higher than MM. >> >>I played around with the code in the above link by following the >>instruction, it works perfectly when some given probe sets need to >>be taken out, but have some problem when some probes (not whole >>probe sets) need to be taken out. I am wondering do you have an >>example for taking some probes as well as probe sets? I have not >>figured out what's the correct input for probes. >> >>I tried the following inputs and neither of them works. >> >>maskedprobeSets = rownames(exprs(justRMA(filenames = >>celfiles)))[1:100] # this format works >> >>maskedprobes = rownames(pm(AffyBatch))[1:500] # doesn't work >>maskedprobes = as.character(1:500) # doesn't work >> >>RemoveProbes(listOutProbes=maskedprobes, >>listOutProbeSets=maskedprobeSets, cleancdf) >> >> >>Thanks a lot! Your help will be highly appreciated. >> >>-Jack > [[alternative HTML version deleted]]
cdf probe cdf probe • 1.4k views
ADD COMMENT
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 24 days ago
United States
Hi Jack, I found I had an even more updated version from Jan 2009. The code is below. Try using this, and let me know if you still have the problem. Note that I changed the arguments to RemoveProbes(); there is only listOutProbes, listOutProbesSets and cleancdf. cleancdf is the name array, without "cdf" or "probe" at the end. For example, if you're using the mouse 430 2.0 array, you'd need cleancdf = "mouse4302". If you're still having problems, please send the entire output of your code and sessionInfo(). Jenny ### The first part is just creating two ojects (ResetEnvir and RemoveProbes) originally ### written by Ariel Chernomoretz and modified by Jenny Drnevich to remove individual ### probes and/or entire probesets. Just highlight everything from here until ### you see STOP and paste it to R all at once ### Updated Jan 2009 ResetEnvir<-function(cleancdf){ cdfpackagename <- paste(cleancdf,"cdf",sep="") probepackagename <- paste(cleancdf,"probe",sep="") ll<-search() cdfpackagepos <- grep(cdfpackagename,ll) if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos) ll<-search() probepackagepos <- grep(probepackagename,ll) if(length(probepackagepos)>0) detach(pos=probepackagepos) require(cdfpackagename,character.only=T) require(probepackagename,character.only=T) require(affy) } RemoveProbes<-function(listOutProbes=NULL, listOutProbeSets=NULL, cleancdf,destructive=TRUE){ #default probe dataset values cdfpackagename <- paste(cleancdf,"cdf",sep="") probepackagename <- paste(cleancdf,"probe",sep="") require(cdfpackagename,character.only = TRUE) require(probepackagename,character.only = TRUE) probe.env.orig <- get(probepackagename) if(!is.null(listOutProbes)){ # taking probes out from CDF env probes<- unlist(lapply(listOutProbes,function(x){ a<-strsplit(x,"at") aux1<-paste(a[[1]][1],"at",sep="") aux2<-as.integer(a[[1]][2]) c(aux1,aux2) })) n1<-as.character(probes[seq(1,(length(probes)/2))*2-1]) n2<-as.integer(probes[seq(1,(length(probes)/2))*2]) probes<-data.frame(I(n1),n2) probes[,1]<-as.character(probes[,1]) probes[,2]<-as.integer(probes[,2]) pset<-unique(probes[,1]) if (!is.null(listOutProbeSets)) pset <- setdiff(pset,listOutProbeSets) for(i in seq(along=pset)){ ii <-grep(pset[i],probes[,1]) iout<-probes[ii,2] a<-get(pset[i],env=get(cdfpackagename)) a<-a[-iout,] if(!is.matrix(a)) a <- matrix(a,ncol=2,byrow=T) if(nrow(a)==0) stop("Probe set ",pset[i]," has zero probes left. Remove this entire probe set in the listOutProbeSets argument") assign(pset[i],a,env=get(cdfpackagename)) } } # taking probesets out from CDF env if(!is.null(listOutProbeSets)){ rm(list=listOutProbeSets,envir=get(cdfpackagename)) } # setting the PROBE env accordingly (idea from gcrma compute.affinities.R) tmp <- get("xy2indices",paste("package:",cdfpackagename,sep="")) newAB <- new("AffyBatch",cdfName=cleancdf) pmIndex <- unlist(indexProbes(newAB,"pm")) subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmInde x) rm(newAB) iNA <- whichis.na(subIndex)) if(length(iNA)>0){ ipos<-grep(probepackagename,search()) assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) } } At 08:41 AM 9/2/2010, Jack Luo wrote: >Jenny, > >Thanks for your email. The reason you provided makes perfect sense. >However, I've looked into this and found that it may not be the root >cause of the error. I tried just masking 25 probes and 1 probe set >(U133A array) as below: > >maskedprobes > [1] > "222152_at1" "222152_at2" "222152_at3" "222152_at4" > "222152_at5" "222152_at6" > [7] > "222152_at7" "222152_at8" "222152_at9" "222152_at10" > "222152_at11" "204659_s_at1" >[13] >"204659_s_at2" "204659_s_at3" "204659_s_at4" "204659_s_at5" >"204659_s_at6" "204659_s_at7" >[19] "204659_s_at8" "204659_s_at9" "204659_s_at10" "204659_s_at11" >"210893_at1" "210893_at2" >[25] "210893_at3" > > > maskedprobeSets >[1] "222152_at" > >Presumably it should run into error since all the probes within the >probe set have been taken out. > >However, the code still runs through and the dimension works out correctly. >A <- ReadAffy() >B <- justMAS(A,tgt = 500,scale = TRUE) > > dim(pm(A)) >[1] 247940 54 # the dimension without masking is 247965, >which is equal to 247940 + 25 > > dim(B) >Features Samples > 22281 54 # the dimension without masking is 22283, > which is equal to 22281 + 2 (the codes knows that all the probes in > 204659_s_at have been taken out, so the total # of masked probe sets is 2). > >In addition, for one of the dataset which the code does not run into >problem, I found that there are ~ 4000 probe sets having this kind >of problem: all the probes within the probe sets have been taken >out. Yet it still runs through and the dimension works out correctly. > >Sorry for bothering you so much on this, I really appreciate it. > >Best, >-Jack > > > > > > > > > >On Wed, Sep 1, 2010 at 4:57 PM, Jenny Drnevich ><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >Hi Jack, > >There shouldn't be a limit on the number of probes/probesets that >you can remove. However, based on the error message, my guess is >that there's some redundancy between your list of probe sets to >remove and probes to remove. Probably you have all the probes of a >probe set in your probe list, and that probe set in your probe set >list as well. The code first removes all the probes, so when it goes >to remove that probe set, it won't find anything and could throw an >error. I don't know for sure that this is your problem, but I've >seen it before. > >Jenny > > >At 02:54 PM 9/1/2010, Jack Luo wrote: >>Hi Jenny, >> >>I run the code through 3 different datasets (each consists of 50-60 >>U133a cel files), the first 2 datasets run pretty smoothly, the >>last one has the error msg: >>Error in ans[[i]][, i.probes] : incorrect number of dimensions >> >>Not sure what's causing the problem, but the last one has the >>largest number of masked probes/probesets: ~179k masked probes (out >>of 247965 in total) and ~15k masked probe sets (out of 22283 in >>total). Does the code have a limit on the number of masked >>probe/probesets? or there are certain probes/probe sets that can >>not be filtered? >> >>Thanks a lot, >> >>-Jack >> >> >>On Tue, Aug 31, 2010 at 5:57 PM, Jenny Drnevich >><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >>Hi Jack, >>It's best to also post your question to the BioC list. First of >>all, you link was to the September 2006 code, but there was an >>update to the code in September 2008 (I think it still should work): >> >><https: stat.ethz.ch="" pipermail="" bioconductor="" 2008-september="" 024296.h="" tml="">https://stat.ethz.ch/pipermail/bioconductor/2008-September/024296. html >> >> >>Did you see this more recent post on the BioC list about removing >>individual probes? The individual probe names have to be in the >>correct format, and your's aren't: >><https: stat.ethz.ch="" pipermail="" bioconductor="" 2010-january="" 031463.htm="" l="">https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.html >> >> >>HTH, >>Jenny >> >>At 11:46 AM 8/31/2010, you wrote: >>>Hi Jenny, >>>I am a BioC user and find the following link: >>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2006-september="" 014242.="" html="">https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242 .html >>> >>>which is related to removing probes/probe sets from the cdf. It's >>>highly related to my work, which needs to perform normalization >>>after filtering out some probe pairs whose PM is not significantly >>>higher than MM. >>>I played around with the code in the above link by following the >>>instruction, it works perfectly when some given probe sets need to >>>be taken out, but have some problem when some probes (not whole >>>probe sets) need to be taken out. I am wondering do you have an >>>example for taking some probes as well as probe sets? I have not >>>figured out what's the correct input for probes. >>>I tried the following inputs and neither of them works. >>>maskedprobeSets = rownames(exprs(justRMA(filenames = >>>celfiles)))[1:100] # this format works >>>maskedprobes = rownames(pm(AffyBatch))[1:500] # doesn't work >>>maskedprobes = as.character(1:500) # doesn't work >>>RemoveProbes(listOutProbes=maskedprobes, >>>listOutProbeSets=maskedprobeSets, cleancdf) >>> >>>Thanks a lot! Your help will be highly appreciated. >>>-Jack [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Jenny, I played around with the new code and found that the only way to make it work (when there are probesets whose probes are all taken out) is to INCLUDE the probeset name in the listOutProbeSets arguments, not exclude them (the error msg is confusing), please see below. *> maskedprobes [1] "222152_at1" "222152_at2" "222152_at3" "222152_at4" "222152_at5" "222152_at6" [7] "222152_at7" "222152_at8" "222152_at9" "222152_at10" "222152_at11" "204659_s_at1" [13] "204659_s_at2" "204659_s_at3" "204659_s_at4" > maskedprobeSets [1] "210893_at" >RemoveProbes(listOutProbes=maskedprobes, listOutProbeSets=maskedprobeSets, + cleancdf = "hgu133a") Error in RemoveProbes(listOutProbes = maskedprobes, listOutProbeSets = maskedprobeSets, : Probe set 222152_at has zero probes left. Remove this entire probe set in the listOutProbeSets argument * However, it the maskedprobeSets <- 222152_at, it runs through without problem. I'll run on my data and let you know later, since it runs quite a while. But if what I mentioned above is the key, it should work. Thanks, -Jack * * On Thu, Sep 2, 2010 at 10:09 AM, Jenny Drnevich <drnevich@illinois.edu>wrote: > Hi Jack, > > I found I had an even more updated version from Jan 2009. The code is > below. Try using this, and let me know if you still have the problem. Note > that I changed the arguments to RemoveProbes(); there is only listOutProbes, > listOutProbesSets and cleancdf. cleancdf is the name array, without "cdf" or > "probe" at the end. For example, if you're using the mouse 430 2.0 array, > you'd need cleancdf = "mouse4302". If you're still having problems, please > send the entire output of your code and sessionInfo(). > > Jenny > > ### The first part is just creating two ojects (ResetEnvir and > RemoveProbes) originally > ### written by Ariel Chernomoretz and modified by Jenny Drnevich to remove > individual > ### probes and/or entire probesets. Just highlight everything from here > until > ### you see STOP and paste it to R all at once > > ### Updated Jan 2009 > > > ResetEnvir<-function(cleancdf){ > cdfpackagename <- paste(cleancdf,"cdf",sep="") > probepackagename <- paste(cleancdf,"probe",sep="") > ll<-search() > cdfpackagepos <- grep(cdfpackagename,ll) > if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos) > ll<-search() > probepackagepos <- grep(probepackagename,ll) > if(length(probepackagepos)>0) detach(pos=probepackagepos) > require(cdfpackagename,character.only=T) > require(probepackagename,character.only=T) > require(affy) > } > > > RemoveProbes<-function(listOutProbes=NULL, > listOutProbeSets=NULL, > > > cleancdf,destructive=TRUE){ > > > > #default probe dataset values > cdfpackagename <- paste(cleancdf,"cdf",sep="") > probepackagename <- paste(cleancdf,"probe",sep="") > require(cdfpackagename,character.only = TRUE) > require(probepackagename,character.only = TRUE) > probe.env.orig <- get(probepackagename) > > > > if(!is.null(listOutProbes)){ > # taking probes out from CDF env > probes<- unlist(lapply(listOutProbes,function(x){ > a<-strsplit(x,"at") > aux1<-paste(a[[1]][1],"at",sep="") > aux2<-as.integer(a[[1]][2]) > c(aux1,aux2) > })) > n1<-as.character(probes[seq(1,(length(probes)/2))*2-1]) > n2<-as.integer(probes[seq(1,(length(probes)/2))*2]) > probes<-data.frame(I(n1),n2) > probes[,1]<-as.character(probes[,1]) > probes[,2]<-as.integer(probes[,2]) > pset<-unique(probes[,1]) > if (!is.null(listOutProbeSets)) > pset <- setdiff(pset,listOutProbeSets) > for(i in seq(along=pset)){ > ii <-grep(pset[i],probes[,1]) > iout<-probes[ii,2] > a<-get(pset[i],env=get(cdfpackagename)) > a<-a[-iout,] > if(!is.matrix(a)) a <- matrix(a,ncol=2,byrow=T) > if(nrow(a)==0) > stop("Probe set ",pset[i]," has zero probes left. Remove this entire > probe set > in the listOutProbeSets argument") > assign(pset[i],a,env=get(cdfpackagename)) > } > } > > > > # taking probesets out from CDF env > if(!is.null(listOutProbeSets)){ > rm(list=listOutProbeSets,envir=get(cdfpackagename)) > } > > > > # setting the PROBE env accordingly (idea from gcrma compute.affinities.R) > tmp <- get("xy2indices",paste("package:",cdfpackagename,sep="")) > newAB <- new("AffyBatch",cdfName=cleancdf) > pmIndex <- unlist(indexProbes(newAB,"pm")) > subIndex<- > match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmIn dex) > rm(newAB) > iNA <- whichis.na(subIndex)) > > > > if(length(iNA)>0){ > ipos<-grep(probepackagename,search()) > assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) > > } > } > > > At 08:41 AM 9/2/2010, Jack Luo wrote: > > Jenny, > > Thanks for your email. The reason you provided makes perfect sense. > However, I've looked into this and found that it may not be the root cause > of the error. I tried just masking 25 probes and 1 probe set (U133A array) > as below: > >maskedprobes > [1] "222152_at1" "222152_at2" "222152_at3" "222152_at4" > "222152_at5" "222152_at6" > [7] "222152_at7" "222152_at8" "222152_at9" "222152_at10" > "222152_at11" "204659_s_at1" > [13] "204659_s_at2" "204659_s_at3" "204659_s_at4" "204659_s_at5" > "204659_s_at6" "204659_s_at7" > [19] "204659_s_at8" "204659_s_at9" "204659_s_at10" "204659_s_at11" > "210893_at1" "210893_at2" > [25] "210893_at3" > > > maskedprobeSets > [1] "222152_at" > > Presumably it should run into error since all the probes within the probe > set have been taken out. > > However, the code still runs through and the dimension works out correctly. > > A <- ReadAffy() > B <- justMAS(A,tgt = 500,scale = TRUE) > > dim(pm(A)) > [1] 247940 54 # the dimension without masking is 247965, which is > equal to 247940 + 25 > > dim(B) > Features Samples > 22281 54 # the dimension without masking is 22283, which is > equal to 22281 + 2 (the codes knows that all the probes in 204659_s_at have > been taken out, so the total # of masked probe sets is 2). > > In addition, for one of the dataset which the code does not run into > problem, I found that there are ~ 4000 probe sets having this kind of > problem: all the probes within the probe sets have been taken out. Yet it > still runs through and the dimension works out correctly. > > Sorry for bothering you so much on this, I really appreciate it. > > Best, > -Jack > > > > > > > > > > On Wed, Sep 1, 2010 at 4:57 PM, Jenny Drnevich <drnevich@illinois.edu> > wrote: > Hi Jack, > > There shouldn't be a limit on the number of probes/probesets that you can > remove. However, based on the error message, my guess is that there's some > redundancy between your list of probe sets to remove and probes to remove. > Probably you have all the probes of a probe set in your probe list, and that > probe set in your probe set list as well. The code first removes all the > probes, so when it goes to remove that probe set, it won't find anything and > could throw an error. I don't know for sure that this is your problem, but > I've seen it before. > > Jenny > > > At 02:54 PM 9/1/2010, Jack Luo wrote: > > Hi Jenny, > > I run the code through 3 different datasets (each consists of 50-60 U133a > cel files), the first 2 datasets run pretty smoothly, the last one has the > error msg: > Error in ans[[i]][, i.probes] : incorrect number of dimensions > > Not sure what's causing the problem, but the last one has the largest > number of masked probes/probesets: ~179k masked probes (out of 247965 in > total) and ~15k masked probe sets (out of 22283 in total). Does the code > have a limit on the number of masked probe/probesets? or there are certain > probes/probe sets that can not be filtered? > > Thanks a lot, > > -Jack > > > On Tue, Aug 31, 2010 at 5:57 PM, Jenny Drnevich <drnevich@illinois.edu> > wrote: Hi Jack, > It's best to also post your question to the BioC list. First of all, you > link was to the September 2006 code, but there was an update to the code in > September 2008 (I think it still should work): > > https://stat.ethz.ch/pipermail/bioconductor/2008-September/024296.html > > Did you see this more recent post on the BioC list about removing > individual probes? The individual probe names have to be in the correct > format, and your's aren't: https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.html > > HTH, Jenny > > At 11:46 AM 8/31/2010, you wrote: > > Hi Jenny, > I am a BioC user and find the following link: https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html > which is related to removing probes/probe sets from the cdf. It's highly > related to my work, which needs to perform normalization after filtering out > some probe pairs whose PM is not significantly higher than MM. > I played around with the code in the above link by following the > instruction, it works perfectly when some given probe sets need to be taken > out, but have some problem when some probes (not whole probe sets) need to > be taken out. I am wondering do you have an example for taking some probes > as well as probe sets? I have not figured out what's the correct input for > probes. > I tried the following inputs and neither of them works. > maskedprobeSets = rownames(exprs(justRMA(filenames = celfiles)))[1:100] # > this format works > maskedprobes = rownames(pm(AffyBatch))[1:500] # doesn't work maskedprobes > = as.character(1:500) # doesn't work > RemoveProbes(listOutProbes=maskedprobes, listOutProbeSets=maskedprobeSets, > cleancdf) > > Thanks a lot! Your help will be highly appreciated. > -Jack > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Jack Luo ▴ 440
@jack-luo-4241
Last seen 9.7 years ago
Jenny, Thanks for your email. The reason you provided makes perfect sense. However, I've looked into this and found that it may not be the root cause of the error. I tried just masking 25 probes and 1 probe set (U133A array) as below: >maskedprobes [1] "222152_at1" "222152_at2" "222152_at3" "222152_at4" "222152_at5" "222152_at6" [7] "222152_at7" "222152_at8" "222152_at9" "222152_at10" "222152_at11" "204659_s_at1" [13] "204659_s_at2" "204659_s_at3" "204659_s_at4" "204659_s_at5" "204659_s_at6" "204659_s_at7" [19] "204659_s_at8" "204659_s_at9" "204659_s_at10" "204659_s_at11" "210893_at1" "210893_at2" [25] "210893_at3" > maskedprobeSets [1] "222152_at" Presumably it should run into error since all the probes within the probe set have been taken out. However, the code still runs through and the dimension works out correctly. A <- ReadAffy() B <- justMAS(A,tgt = 500,scale = TRUE) > dim(pm(A)) [1] 247940 54 # the dimension without masking is 247965, which is equal to 247940 + 25 > dim(B) Features Samples 22281 54 # the dimension without masking is 22283, which is equal to 22281 + 2 (the codes knows that all the probes in 204659_s_at have been taken out, so the total # of masked probe sets is 2). In addition, for one of the dataset which the code does not run into problem, I found that there are ~ 4000 probe sets having this kind of problem: all the probes within the probe sets have been taken out. Yet it still runs through and the dimension works out correctly. Sorry for bothering you so much on this, I really appreciate it. Best, -Jack On Wed, Sep 1, 2010 at 4:57 PM, Jenny Drnevich <drnevich@illinois.edu>wrote: > Hi Jack, > > There shouldn't be a limit on the number of probes/probesets that you can > remove. However, based on the error message, my guess is that there's some > redundancy between your list of probe sets to remove and probes to remove. > Probably you have all the probes of a probe set in your probe list, and that > probe set in your probe set list as well. The code first removes all the > probes, so when it goes to remove that probe set, it won't find anything and > could throw an error. I don't know for sure that this is your problem, but > I've seen it before. > > Jenny > > > At 02:54 PM 9/1/2010, Jack Luo wrote: > > Hi Jenny, > > I run the code through 3 different datasets (each consists of 50-60 U133a > cel files), the first 2 datasets run pretty smoothly, the last one has the > error msg: > Error in ans[[i]][, i.probes] : incorrect number of dimensions > > Not sure what's causing the problem, but the last one has the largest > number of masked probes/probesets: ~179k masked probes (out of 247965 in > total) and ~15k masked probe sets (out of 22283 in total). Does the code > have a limit on the number of masked probe/probesets? or there are certain > probes/probe sets that can not be filtered? > > Thanks a lot, > > -Jack > > > On Tue, Aug 31, 2010 at 5:57 PM, Jenny Drnevich <drnevich@illinois.edu> > wrote: > Hi Jack, > > It's best to also post your question to the BioC list. First of all, you > link was to the September 2006 code, but there was an update to the code in > September 2008 (I think it still should work): > > https://stat.ethz.ch/pipermail/bioconductor/2008-September/024296.html > > > Did you see this more recent post on the BioC list about removing > individual probes? The individual probe names have to be in the correct > format, and your's aren't: > https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.html > > > HTH, > Jenny > > > At 11:46 AM 8/31/2010, you wrote: > > Hi Jenny, > > I am a BioC user and find the following link: > https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html > > which is related to removing probes/probe sets from the cdf. It's highly > related to my work, which needs to perform normalization after filtering out > some probe pairs whose PM is not significantly higher than MM. > > I played around with the code in the above link by following the > instruction, it works perfectly when some given probe sets need to be taken > out, but have some problem when some probes (not whole probe sets) need to > be taken out. I am wondering do you have an example for taking some probes > as well as probe sets? I have not figured out what's the correct input for > probes. > > I tried the following inputs and neither of them works. > > maskedprobeSets = rownames(exprs(justRMA(filenames = celfiles)))[1:100] # > this format works > > maskedprobes = rownames(pm(AffyBatch))[1:500] # doesn't work > maskedprobes = as.character(1:500) # doesn't work > > RemoveProbes(listOutProbes=maskedprobes, listOutProbeSets=maskedprobeSets, > cleancdf) > > > Thanks a lot! Your help will be highly appreciated. > > -Jack > > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 24 days ago
United States
Hi Jack, You're right, the error message can be interpreted exactly the opposite of what it's supposed to mean! Is this better? "Remove this entire probe set by adding it to the listOutProbeSets argument" Ideally, you would also remove the individual probes from the listOutProbes argument, but I'm not sure it's necessary. If you have a lot of these potentially in your list, it would be a pain to do. I don't have time, but you might try to write a function to examine listOutProbes, see which probe sets have all probes removed, and then output these probe sets so you can add them to your listOutProbeSets. All the necessary code to do this should be in RemoveProbes, if you feel comfortable doing the hacking. Please send it to me if you do. Jenny At 10:40 AM 9/2/2010, Jack Luo wrote: >Jenny, > >I played around with the new code and found that the only way to >make it work (when there are probesets whose probes are all taken >out) is to INCLUDE the probeset name in the listOutProbeSets >arguments, not exclude them (the error msg is confusing), please see below. > > > maskedprobes > [1] > "222152_at1" "222152_at2" "222152_at3" "222152_at4" > "222152_at5" "222152_at6" > [7] > "222152_at7" "222152_at8" "222152_at9" "222152_at10" > "222152_at11" "204659_s_at1" >[13] "204659_s_at2" "204659_s_at3" "204659_s_at4" > > maskedprobeSets >[1] "210893_at" > > >RemoveProbes(listOutProbes=maskedprobes, listOutProbeSets=maskedprobeSets, >+ cleancdf = "hgu133a") >Error in RemoveProbes(listOutProbes = maskedprobes, listOutProbeSets >= maskedprobeSets, : > Probe set 222152_at has zero probes left. Remove this entire probe set > in the listOutProbeSets argument > >However, it the maskedprobeSets <- 222152_at, it runs through >without problem. I'll run on my data and let you know later, since >it runs quite a while. But if what I mentioned above is the key, it >should work. > >Thanks, > >-Jack > > > > > > > >On Thu, Sep 2, 2010 at 10:09 AM, Jenny Drnevich ><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >Hi Jack, > >I found I had an even more updated version from Jan 2009. The code >is below. Try using this, and let me know if you still have the >problem. Note that I changed the arguments to RemoveProbes(); there >is only listOutProbes, listOutProbesSets and cleancdf. cleancdf is >the name array, without "cdf" or "probe" at the end. For example, if >you're using the mouse 430 2.0 array, you'd need cleancdf = >"mouse4302". If you're still having problems, please send the entire >output of your code and sessionInfo(). > >Jenny > >### The first part is just creating two ojects (ResetEnvir and >RemoveProbes) originally >### written by Ariel Chernomoretz and modified by Jenny Drnevich to >remove individual >### probes and/or entire probesets. Just highlight everything from here until >### you see STOP and paste it to R all at once > >### Updated Jan 2009 > > >ResetEnvir<-function(cleancdf){ > cdfpackagename <- paste(cleancdf,"cdf",sep="") > probepackagename <- paste(cleancdf,"probe",sep="") > ll<-search() > cdfpackagepos <- grep(cdfpackagename,ll) > if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos) > ll<-search() > probepackagepos <- grep(probepackagename,ll) > if(length(probepackagepos)>0) detach(pos=probepackagepos) > require(cdfpackagename,character.only=T) > require(probepackagename,character.only=T) > require(affy) >} > > >RemoveProbes<-function(listOutProbes=NULL, > listOutProbeSets=NULL, > > >cleancdf,destructive=TRUE){ > > > > #default probe dataset values > cdfpackagename <- paste(cleancdf,"cdf",sep="") > probepackagename <- paste(cleancdf,"probe",sep="") > require(cdfpackagename,character.only = TRUE) > require(probepackagename,character.only = TRUE) > probe.env.orig <- get(probepackagename) > > > > if(!is.null(listOutProbes)){ > # taking probes out from CDF env > probes<- unlist(lapply(listOutProbes,function(x){ > a<-strsplit(x,"at") > aux1<-paste(a[[1]][1],"at",sep="") > aux2<-as.integer(a[[1]][2]) > c(aux1,aux2) > })) > n1<-as.character(probes[seq(1,(length(probes)/2))*2-1]) > n2<-as.integer(probes[seq(1,(length(probes)/2))*2]) > probes<-data.frame(I(n1),n2) > probes[,1]<-as.character(probes[,1]) > probes[,2]<-as.integer(probes[,2]) > pset<-unique(probes[,1]) > if (!is.null(listOutProbeSets)) > pset <- setdiff(pset,listOutProbeSets) > for(i in seq(along=pset)){ > ii <-grep(pset[i],probes[,1]) > iout<-probes[ii,2] > a<-get(pset[i],env=get(cdfpackagename)) > a<-a[-iout,] > if(!is.matrix(a)) a <- matrix(a,ncol=2,byrow=T) > if(nrow(a)==0) > stop("Probe set ",pset[i]," has zero probes left. Remove this > entire probe set > in the listOutProbeSets argument") > assign(pset[i],a,env=get(cdfpackagename)) > } > } > > > > # taking probesets out from CDF env > if(!is.null(listOutProbeSets)){ > rm(list=listOutProbeSets,envir=get(cdfpackagename)) > } > > > > # setting the PROBE env accordingly (idea from gcrma compute.affinities.R) > tmp <- get("xy2indices",paste("package:",cdfpackagename,sep="")) > newAB <- new("AffyBatch",cdfName=cleancdf) > pmIndex <- unlist(indexProbes(newAB,"pm")) > subIndex<- > match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmIn dex) > rm(newAB) > iNA <- which(<http: is.na="">is.na(subIndex)) > > > > if(length(iNA)>0){ > ipos<-grep(probepackagename,search()) > assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) > > } >} > > >At 08:41 AM 9/2/2010, Jack Luo wrote: >>Jenny, >> >>Thanks for your email. The reason you provided makes perfect sense. >>However, I've looked into this and found that it may not be the >>root cause of the error. I tried just masking 25 probes and 1 probe >>set (U133A array) as below: >> >maskedprobes >> [1] >> "222152_at1" "222152_at2" "222152_at3" "222152_at4" >> "222152_at5" "222152_at6" >> [7] >> "222152_at7" "222152_at8" "222152_at9" "222152_at10" >> "222152_at11" "204659_s_at1" >>[13] >>"204659_s_at2" "204659_s_at3" "204659_s_at4" "204659_s_at5" >>"204659_s_at6" "204659_s_at7" >>[19] "204659_s_at8" "204659_s_at9" "204659_s_at10" >>"204659_s_at11" "210893_at1" "210893_at2" >>[25] "210893_at3" >> >> > maskedprobeSets >>[1] "222152_at" >> >>Presumably it should run into error since all the probes within the >>probe set have been taken out. >> >>However, the code still runs through and the dimension works out correctly. >>A <- ReadAffy() >>B <- justMAS(A,tgt = 500,scale = TRUE) >> > dim(pm(A)) >>[1] 247940 54 # the dimension without masking is 247965, >>which is equal to 247940 + 25 >> > dim(B) >>Features Samples >> 22281 54 # the dimension without masking is 22283, >> which is equal to 22281 + 2 (the codes knows that all the probes >> in 204659_s_at have been taken out, so the total # of masked probe sets is 2). >> >>In addition, for one of the dataset which the code does not run >>into problem, I found that there are ~ 4000 probe sets having this >>kind of problem: all the probes within the probe sets have been >>taken out. Yet it still runs through and the dimension works out correctly. >> >>Sorry for bothering you so much on this, I really appreciate it. >> >>Best, >>-Jack >> >> >> >> >> >> >> >> >> >>On Wed, Sep 1, 2010 at 4:57 PM, Jenny Drnevich >><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >>Hi Jack, >>There shouldn't be a limit on the number of probes/probesets that >>you can remove. However, based on the error message, my guess is >>that there's some redundancy between your list of probe sets to >>remove and probes to remove. Probably you have all the probes of a >>probe set in your probe list, and that probe set in your probe set >>list as well. The code first removes all the probes, so when it >>goes to remove that probe set, it won't find anything and could >>throw an error. I don't know for sure that this is your problem, >>but I've seen it before. >>Jenny >> >>At 02:54 PM 9/1/2010, Jack Luo wrote: >>>Hi Jenny, >>>I run the code through 3 different datasets (each consists of >>>50-60 U133a cel files), the first 2 datasets run pretty smoothly, >>>the last one has the error msg: >>>Error in ans[[i]][, i.probes] : incorrect number of dimensions >>>Not sure what's causing the problem, but the last one has the >>>largest number of masked probes/probesets: ~179k masked probes >>>(out of 247965 in total) and ~15k masked probe sets (out of 22283 >>>in total). Does the code have a limit on the number of masked >>>probe/probesets? or there are certain probes/probe sets that can >>>not be filtered? >>>Thanks a lot, >>>-Jack >>> >>>On Tue, Aug 31, 2010 at 5:57 PM, Jenny Drnevich >>><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >>>Hi Jack, >>>It's best to also post your question to the BioC list. First of >>>all, you link was to the September 2006 code, but there was an >>>update to the code in September 2008 (I think it still should work): >>> >>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2008-september="" 024296.="" html="">https://stat.ethz.ch/pipermail/bioconductor/2008-September/024296 .html >>> >>>Did you see this more recent post on the BioC list about removing >>>individual probes? The individual probe names have to be in the >>>correct format, and your's aren't: >>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2010-january="" 031463.ht="" ml="">https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.htm l >>> >>>HTH, >>>Jenny >>>At 11:46 AM 8/31/2010, you wrote: >>>>Hi Jenny, >>>>I am a BioC user and find the following link: >>>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2006-september="" 014242="" .html="">https://stat.ethz.ch/pipermail/bioconductor/2006-September/01424 2.html >>>> >>>>which is related to removing probes/probe sets from the cdf. It's >>>>highly related to my work, which needs to perform normalization >>>>after filtering out some probe pairs whose PM is not >>>>significantly higher than MM. >>>>I played around with the code in the above link by following the >>>>instruction, it works perfectly when some given probe sets need >>>>to be taken out, but have some problem when some probes (not >>>>whole probe sets) need to be taken out. I am wondering do you >>>>have an example for taking some probes as well as probe sets? I >>>>have not figured out what's the correct input for probes. >>>>I tried the following inputs and neither of them works. >>>>maskedprobeSets = rownames(exprs(justRMA(filenames = >>>>celfiles)))[1:100] # this format works >>>>maskedprobes = rownames(pm(AffyBatch))[1:500] # doesn't work >>>>maskedprobes = as.character(1:500) # doesn't work >>>>RemoveProbes(listOutProbes=maskedprobes, >>>>listOutProbeSets=maskedprobeSets, cleancdf) >>>>Thanks a lot! Your help will be highly appreciated. >>>>-Jack > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Jenny, Thanks for your email. I'll look to see whether I can do this. But after running the code using my own data, I am puzzled whether my speculation in the previous email is correct or not, since I also run into problems even when not all the probes are taken out in the probe sets. One quick question is that does your code have any special treatment on those probe sets that have more than 11 probes? We know that in U133A, majority of the probe sets have 11 probe pairs and there are ~ 500 probe sets have more than 11 probe pairs. I run the new code against two of my datasets (one ran through perfectly using the 08 version of the code and the other one ran into problem using the 08 version of the code) and found that the new code run into errors for both the datasets. The error msg shows *Error in RemoveProbes(listOutProbes = maskedprobes, listOutProbeSets = maskedprobeSets, : Probe set 320_at has zero probes left. Remove this entire probe set in the listOutProbeSets argument* The other one shows *Error in RemoveProbes(listOutProbes = maskedprobes, listOutProbeSets = maskedprobeSets, : Probe set 632_at has zero probes left. Remove this entire probe set in the listOutProbeSets argument* The common thing about "320_at" and "632_at" are that they both have 16 probe pairs before masking. Actually the maskedprobes do not contain all the probe pairs for the above probe sets. For example, for the "320_at", there are still 5 probe pairs left. *> tmpS <- unlist(lapply(maskedprobes,function(x){ a<-strsplit(x,"at") aux1<-paste(a[[1]][1],"at",sep="") aux1 })) > head(maskedprobes) [1] "1007_s_at3" "1053_at1" "1053_at2" "1053_at4" "1053_at5" "1053_at6" > head(tmpS) [1] "1007_s_at" "1053_at" "1053_at" "1053_at" "1053_at" "1053_at" > sum(tmpS == "320_at") [1] 5 * Thanks again, -Jack On Thu, Sep 2, 2010 at 12:37 PM, Jenny Drnevich <drnevich@illinois.edu>wrote: > Hi Jack, > > You're right, the error message can be interpreted exactly the opposite of > what it's supposed to mean! Is this better? "*Remove this entire probe set > by adding it to the listOutProbeSets argument" *Ideally, you would also > remove the individual probes from the listOutProbes argument, but I'm not > sure it's necessary. If you have a lot of these potentially in your list, it > would be a pain to do. I don't have time, but you might try to write a > function to examine listOutProbes, see which probe sets have all probes > removed, and then output these probe sets so you can add them to your > listOutProbeSets. All the necessary code to do this should be in > RemoveProbes, if you feel comfortable doing the hacking. Please send it to > me if you do. > > Jenny > > > At 10:40 AM 9/2/2010, Jack Luo wrote: > > Jenny, > > I played around with the new code and found that the only way to make it > work (when there are probesets whose probes are all taken out) is to INCLUDE > the probeset name in the listOutProbeSets arguments, not exclude them (the > error msg is confusing), please see below. > > *> maskedprobes > [1] "222152_at1" "222152_at2" "222152_at3" "222152_at4" > "222152_at5" "222152_at6" > [7] "222152_at7" "222152_at8" "222152_at9" "222152_at10" > "222152_at11" "204659_s_at1" > [13] "204659_s_at2" "204659_s_at3" "204659_s_at4" > > maskedprobeSets > [1] "210893_at" > > >RemoveProbes(listOutProbes=maskedprobes, listOutProbeSets=maskedprobeSets, > > + cleancdf = "hgu133a") > Error in RemoveProbes(listOutProbes = maskedprobes, listOutProbeSets = > maskedprobeSets, : > Probe set 222152_at has zero probes left. Remove this entire probe set > in the listOutProbeSets argument > * > However, it the maskedprobeSets <- 222152_at, it runs through without > problem. I'll run on my data and let you know later, since it runs quite a > while. But if what I mentioned above is the key, it should work. > > Thanks, > > -Jack > * > * > > > > > > On Thu, Sep 2, 2010 at 10:09 AM, Jenny Drnevich <drnevich@illinois.edu> > wrote: > Hi Jack, > > I found I had an even more updated version from Jan 2009. The code is > below. Try using this, and let me know if you still have the problem. Note > that I changed the arguments to RemoveProbes(); there is only listOutProbes, > listOutProbesSets and cleancdf. cleancdf is the name array, without "cdf" or > "probe" at the end. For example, if you're using the mouse 430 2.0 array, > you'd need cleancdf = "mouse4302". If you're still having problems, please > send the entire output of your code and sessionInfo(). > > Jenny > > ### The first part is just creating two ojects (ResetEnvir and > RemoveProbes) originally > ### written by Ariel Chernomoretz and modified by Jenny Drnevich to remove > individual > ### probes and/or entire probesets. Just highlight everything from here > until > ### you see STOP and paste it to R all at once > > ### Updated Jan 2009 > > > ResetEnvir<-function(cleancdf){ > cdfpackagename <- paste(cleancdf,"cdf",sep="") > probepackagename <- paste(cleancdf,"probe",sep="") > ll<-search() > cdfpackagepos <- grep(cdfpackagename,ll) > if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos) > ll<-search() > probepackagepos <- grep(probepackagename,ll) > if(length(probepackagepos)>0) detach(pos=probepackagepos) > require(cdfpackagename,character.only=T) > require(probepackagename,character.only=T) > require(affy) > } > > > RemoveProbes<-function(listOutProbes=NULL, > listOutProbeSets=NULL, > > > cleancdf,destructive=TRUE){ > > > > #default probe dataset values > cdfpackagename <- paste(cleancdf,"cdf",sep="") > probepackagename <- paste(cleancdf,"probe",sep="") > require(cdfpackagename,character.only = TRUE) > require(probepackagename,character.only = TRUE) > probe.env.orig <- get(probepackagename) > > > > if(!is.null(listOutProbes)){ > # taking probes out from CDF env > probes<- unlist(lapply(listOutProbes,function(x){ > a<-strsplit(x,"at") > aux1<-paste(a[[1]][1],"at",sep="") > aux2<-as.integer(a[[1]][2]) > c(aux1,aux2) > })) > n1<-as.character(probes[seq(1,(length(probes)/2))*2-1]) > n2<-as.integer(probes[seq(1,(length(probes)/2))*2]) > probes<-data.frame(I(n1),n2) > probes[,1]<-as.character(probes[,1]) > probes[,2]<-as.integer(probes[,2]) > pset<-unique(probes[,1]) > if (!is.null(listOutProbeSets)) > pset <- setdiff(pset,listOutProbeSets) > for(i in seq(along=pset)){ > ii <-grep(pset[i],probes[,1]) > iout<-probes[ii,2] > a<-get(pset[i],env=get(cdfpackagename)) > a<-a[-iout,] > if(!is.matrix(a)) a <- matrix(a,ncol=2,byrow=T) > if(nrow(a)==0) > stop("Probe set ",pset[i]," has zero probes left. Remove this entire > probe set > in the listOutProbeSets argument") > assign(pset[i],a,env=get(cdfpackagename)) > } > } > > > > # taking probesets out from CDF env > if(!is.null(listOutProbeSets)){ > rm(list=listOutProbeSets,envir=get(cdfpackagename)) > } > > > > # setting the PROBE env accordingly (idea from gcrma compute.affinities.R) > tmp <- get("xy2indices",paste("package:",cdfpackagename,sep="")) > newAB <- new("AffyBatch",cdfName=cleancdf) > pmIndex <- unlist(indexProbes(newAB,"pm")) > subIndex<- > match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmIn dex) > rm(newAB) > iNA <- whichis.na(subIndex)) > > > > if(length(iNA)>0){ > ipos<-grep(probepackagename,search()) > assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) > > } > } > > > At 08:41 AM 9/2/2010, Jack Luo wrote: > > Jenny, > > Thanks for your email. The reason you provided makes perfect sense. > However, I've looked into this and found that it may not be the root cause > of the error. I tried just masking 25 probes and 1 probe set (U133A array) > as below: > >maskedprobes > [1] "222152_at1" "222152_at2" "222152_at3" "222152_at4" > "222152_at5" "222152_at6" > [7] "222152_at7" "222152_at8" "222152_at9" "222152_at10" > "222152_at11" "204659_s_at1" > [13] "204659_s_at2" "204659_s_at3" "204659_s_at4" "204659_s_at5" > "204659_s_at6" "204659_s_at7" > [19] "204659_s_at8" "204659_s_at9" "204659_s_at10" "204659_s_at11" > "210893_at1" "210893_at2" > [25] "210893_at3" > > > maskedprobeSets > [1] "222152_at" > > Presumably it should run into error since all the probes within the probe > set have been taken out. > > However, the code still runs through and the dimension works out correctly. > > A <- ReadAffy() > B <- justMAS(A,tgt = 500,scale = TRUE) > > dim(pm(A)) > [1] 247940 54 # the dimension without masking is 247965, which is > equal to 247940 + 25 > > dim(B) > Features Samples > 22281 54 # the dimension without masking is 22283, which is > equal to 22281 + 2 (the codes knows that all the probes in 204659_s_at have > been taken out, so the total # of masked probe sets is 2). > > In addition, for one of the dataset which the code does not run into > problem, I found that there are ~ 4000 probe sets having this kind of > problem: all the probes within the probe sets have been taken out. Yet it > still runs through and the dimension works out correctly. > > Sorry for bothering you so much on this, I really appreciate it. > > Best, > -Jack > > > > > > > > > > On Wed, Sep 1, 2010 at 4:57 PM, Jenny Drnevich <drnevich@illinois.edu> > wrote: Hi Jack, > There shouldn't be a limit on the number of probes/probesets that you can > remove. However, based on the error message, my guess is that there's some > redundancy between your list of probe sets to remove and probes to remove. > Probably you have all the probes of a probe set in your probe list, and that > probe set in your probe set list as well. The code first removes all the > probes, so when it goes to remove that probe set, it won't find anything and > could throw an error. I don't know for sure that this is your problem, but > I've seen it before. Jenny > > At 02:54 PM 9/1/2010, Jack Luo wrote: > > Hi Jenny, > I run the code through 3 different datasets (each consists of 50-60 U133a > cel files), the first 2 datasets run pretty smoothly, the last one has the > error msg: Error in ans[[i]][, i.probes] : incorrect number of dimensions > Not sure what's causing the problem, but the last one has the largest > number of masked probes/probesets: ~179k masked probes (out of 247965 in > total) and ~15k masked probe sets (out of 22283 in total). Does the code > have a limit on the number of masked probe/probesets? or there are certain > probes/probe sets that can not be filtered? > Thanks a lot, > -Jack > > On Tue, Aug 31, 2010 at 5:57 PM, Jenny Drnevich <drnevich@illinois.edu> > wrote: Hi Jack, It's best to also post your question to the BioC list. > First of all, you link was to the September 2006 code, but there was an > update to the code in September 2008 (I think it still should work): > > https://stat.ethz.ch/pipermail/bioconductor/2008-September/024296.html > Did you see this more recent post on the BioC list about removing > individual probes? The individual probe names have to be in the correct > format, and your's aren't: https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.html > HTH, Jenny > At 11:46 AM 8/31/2010, you wrote: > > Hi Jenny, I am a BioC user and find the following link: https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html which > is related to removing probes/probe sets from the cdf. It's highly related > to my work, which needs to perform normalization after filtering out some > probe pairs whose PM is not significantly higher than MM. I played around > with the code in the above link by following the instruction, it works > perfectly when some given probe sets need to be taken out, but have some > problem when some probes (not whole probe sets) need to be taken out. I am > wondering do you have an example for taking some probes as well as probe > sets? I have not figured out what's the correct input for probes. I tried > the following inputs and neither of them works. maskedprobeSets = > rownames(exprs(justRMA(filenames = celfiles)))[1:100] # this format works maskedprobes > = rownames(pm(AffyBatch))[1:500] # doesn't work maskedprobes = > as.character(1:500) # doesn't work RemoveProbes(listOutProbes=maskedprobes, > listOutProbeSets=maskedprobeSets, cleancdf) > Thanks a lot! Your help will be highly appreciated. -Jack > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 24 days ago
United States
Hi Jack, There's nothing in the code that's specific for 11 probes in the set. I'm not sure what the problem could be - I don't know the code intimately, I only hacked it, and I barely ever use it. Have you tried doing debug(RemoveProbes)? If you're not familiar with it, it will send your next call to RemoveProbes to a debugger where you can step through each line of code to see what it's doing and where the problem is. I use it all the time for hacking. If you can't figure it out, send me (off list) your list of probes and probe sets, and your ENTIRE code that you've been trying (not just error messages!) and I might have time this weekend to look into it. Jenny At 02:03 PM 9/2/2010, Jack Luo wrote: >Jenny, > >Thanks for your email. I'll look to see whether I can do this. But >after running the code using my own data, I am puzzled whether my >speculation in the previous email is correct or not, since I also >run into problems even when not all the probes are taken out in the >probe sets. One quick question is that does your code have any >special treatment on those probe sets that have more than 11 probes? >We know that in U133A, majority of the probe sets have 11 probe >pairs and there are ~ 500 probe sets have more than 11 probe pairs. >I run the new code against two of my datasets (one ran through >perfectly using the 08 version of the code and the other one ran >into problem using the 08 version of the code) and found that the >new code run into errors for both the datasets. The error msg shows > >Error in RemoveProbes(listOutProbes = maskedprobes, listOutProbeSets >= maskedprobeSets, : > Probe set 320_at has zero probes left. Remove this entire probe set > in the listOutProbeSets argument > >The other one shows >Error in RemoveProbes(listOutProbes = maskedprobes, listOutProbeSets >= maskedprobeSets, : > Probe set 632_at has zero probes left. Remove this entire probe set > in the listOutProbeSets argument > >The common thing about "320_at" and "632_at" are that they both have >16 probe pairs before masking. Actually the maskedprobes do not >contain all the probe pairs for the above probe sets. >For example, for the "320_at", there are still 5 probe pairs left. > > > tmpS <- unlist(lapply(maskedprobes,function(x){ > a<-strsplit(x,"at") > aux1<-paste(a[[1]][1],"at",sep="") > > aux1 > })) > > > head(maskedprobes) >[1] "1007_s_at3" >"1053_at1" "1053_at2" "1053_at4" "1053_at5" "1053_at6" > > head(tmpS) >[1] "1007_s_at" "1053_at" "1053_at" "1053_at" "1053_at" "1053_at" > > > sum(tmpS == "320_at") >[1] 5 > >Thanks again, > >-Jack > > >On Thu, Sep 2, 2010 at 12:37 PM, Jenny Drnevich ><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >Hi Jack, > >You're right, the error message can be interpreted exactly the >opposite of what it's supposed to mean! Is this better? "Remove this >entire probe set by adding it to the listOutProbeSets argument" >Ideally, you would also remove the individual probes from the >listOutProbes argument, but I'm not sure it's necessary. If you have >a lot of these potentially in your list, it would be a pain to do. I >don't have time, but you might try to write a function to examine >listOutProbes, see which probe sets have all probes removed, and >then output these probe sets so you can add them to your >listOutProbeSets. All the necessary code to do this should be in >RemoveProbes, if you feel comfortable doing the hacking. Please send >it to me if you do. > >Jenny > > >At 10:40 AM 9/2/2010, Jack Luo wrote: >>Jenny, >> >>I played around with the new code and found that the only way to >>make it work (when there are probesets whose probes are all taken >>out) is to INCLUDE the probeset name in the listOutProbeSets >>arguments, not exclude them (the error msg is confusing), please see below. >> >> > maskedprobes >> [1] >> "222152_at1" "222152_at2" "222152_at3" "222152_at4" >> "222152_at5" "222152_at6" >> [7] >> "222152_at7" "222152_at8" "222152_at9" "222152_at10" >> "222152_at11" "204659_s_at1" >>[13] "204659_s_at2" "204659_s_at3" "204659_s_at4" >> > maskedprobeSets >>[1] "210893_at" >> >> >RemoveProbes(listOutProbes=maskedprobes, listOutProbeSets=maskedprobeSets, >>+ cleancdf = "hgu133a") >>Error in RemoveProbes(listOutProbes = maskedprobes, >>listOutProbeSets = maskedprobeSets, : >> Probe set 222152_at has zero probes left. Remove this entire probe set >> in the listOutProbeSets argument >> >>However, it the maskedprobeSets <- 222152_at, it runs through >>without problem. I'll run on my data and let you know later, since >>it runs quite a while. But if what I mentioned above is the key, it >>should work. >> >>Thanks, >> >>-Jack >> >> >> >> >> >> >> >>On Thu, Sep 2, 2010 at 10:09 AM, Jenny Drnevich >><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >>Hi Jack, >>I found I had an even more updated version from Jan 2009. The code >>is below. Try using this, and let me know if you still have the >>problem. Note that I changed the arguments to RemoveProbes(); there >>is only listOutProbes, listOutProbesSets and cleancdf. cleancdf is >>the name array, without "cdf" or "probe" at the end. For example, >>if you're using the mouse 430 2.0 array, you'd need cleancdf = >>"mouse4302". If you're still having problems, please send the >>entire output of your code and sessionInfo(). >>Jenny >>### The first part is just creating two ojects (ResetEnvir and >>RemoveProbes) originally >>### written by Ariel Chernomoretz and modified by Jenny Drnevich to >>remove individual >>### probes and/or entire probesets. Just highlight everything from >>here until >>### you see STOP and paste it to R all at once >>### Updated Jan 2009 >> >>ResetEnvir<-function(cleancdf){ >> cdfpackagename <- paste(cleancdf,"cdf",sep="") >> probepackagename <- paste(cleancdf,"probe",sep="") >> ll<-search() >> cdfpackagepos <- grep(cdfpackagename,ll) >> if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos) >> ll<-search() >> probepackagepos <- grep(probepackagename,ll) >> if(length(probepackagepos)>0) detach(pos=probepackagepos) >> require(cdfpackagename,character.only=T) >> require(probepackagename,character.only=T) >> require(affy) >>} >> >>RemoveProbes<-function(listOutProbes=NULL, >> listOutProbeSets=NULL, >> >>cleancdf,destructive=TRUE){ >> >> >> #default probe dataset values >> cdfpackagename <- paste(cleancdf,"cdf",sep="") >> probepackagename <- paste(cleancdf,"probe",sep="") >> require(cdfpackagename,character.only = TRUE) >> require(probepackagename,character.only = TRUE) >> probe.env.orig <- get(probepackagename) >> >> >> if(!is.null(listOutProbes)){ >> # taking probes out from CDF env >> probes<- unlist(lapply(listOutProbes,function(x){ >> a<-strsplit(x,"at") >> aux1<-paste(a[[1]][1],"at",sep="") >> aux2<-as.integer(a[[1]][2]) >> c(aux1,aux2) >> })) >> n1<-as.character(probes[seq(1,(length(probes)/2))*2-1]) >> n2<-as.integer(probes[seq(1,(length(probes)/2))*2]) >> probes<-data.frame(I(n1),n2) >> probes[,1]<-as.character(probes[,1]) >> probes[,2]<-as.integer(probes[,2]) >> pset<-unique(probes[,1]) >> if (!is.null(listOutProbeSets)) >> pset <- setdiff(pset,listOutProbeSets) >> for(i in seq(along=pset)){ >> ii <-grep(pset[i],probes[,1]) >> iout<-probes[ii,2] >> a<-get(pset[i],env=get(cdfpackagename)) >> a<-a[-iout,] >> if(!is.matrix(a)) a <- matrix(a,ncol=2,byrow=T) >> if(nrow(a)==0) >> stop("Probe set ",pset[i]," has zero probes left. Remove this >> entire probe set >> in the listOutProbeSets argument") >> assign(pset[i],a,env=get(cdfpackagename)) >> } >> } >> >> >> # taking probesets out from CDF env >> if(!is.null(listOutProbeSets)){ >> rm(list=listOutProbeSets,envir=get(cdfpackagename)) >> } >> >> >> # setting the PROBE env accordingly (idea from gcrma compute.affinities.R) >> tmp <- get("xy2indices",paste("package:",cdfpackagename,sep="")) >> newAB <- new("AffyBatch",cdfName=cleancdf) >> pmIndex <- unlist(indexProbes(newAB,"pm")) >> subIndex<- >> match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmI ndex) >> rm(newAB) >> iNA <- which(<http: is.na="">is.na(subIndex)) >> >> >> if(length(iNA)>0){ >> ipos<-grep(probepackagename,search()) >> assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) >> } >>} >> >>At 08:41 AM 9/2/2010, Jack Luo wrote: >>>Jenny, >>>Thanks for your email. The reason you provided makes perfect >>>sense. However, I've looked into this and found that it may not be >>>the root cause of the error. I tried just masking 25 probes and 1 >>>probe set (U133A array) as below: >>> >maskedprobes >>> [1] >>> "222152_at1" "222152_at2" "222152_at3" "222152_at4" >>> "222152_at5" "222152_at6" >>> [7] >>> "222152_at7" "222152_at8" "222152_at9" "222152_at10" >>> "222152_at11" "204659_s_at1" >>>[13] >>>"204659_s_at2" "204659_s_at3" "204659_s_at4" "204659_s_at5" >>>"204659_s_at6" "204659_s_at7" >>>[19] "204659_s_at8" "204659_s_at9" "204659_s_at10" >>>"204659_s_at11" "210893_at1" "210893_at2" >>>[25] "210893_at3" >>> > maskedprobeSets >>>[1] "222152_at" >>>Presumably it should run into error since all the probes within >>>the probe set have been taken out. >>>However, the code still runs through and the dimension works out correctly. >>>A <- ReadAffy() >>>B <- justMAS(A,tgt = 500,scale = TRUE) >>> > dim(pm(A)) >>>[1] 247940 54 # the dimension without masking is 247965, >>>which is equal to 247940 + 25 >>> > dim(B) >>>Features Samples >>> 22281 54 # the dimension without masking is 22283, >>> which is equal to 22281 + 2 (the codes knows that all the probes >>> in 204659_s_at have been taken out, so the total # of masked probe sets is 2). >>>In addition, for one of the dataset which the code does not run >>>into problem, I found that there are ~ 4000 probe sets having this >>>kind of problem: all the probes within the probe sets have been >>>taken out. Yet it still runs through and the dimension works out correctly. >>>Sorry for bothering you so much on this, I really appreciate it. >>>Best, >>>-Jack >>> >>> >>> >>> >>> >>> >>> >>> >>>On Wed, Sep 1, 2010 at 4:57 PM, Jenny Drnevich >>><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >>>Hi Jack, >>>There shouldn't be a limit on the number of probes/probesets that >>>you can remove. However, based on the error message, my guess is >>>that there's some redundancy between your list of probe sets to >>>remove and probes to remove. Probably you have all the probes of a >>>probe set in your probe list, and that probe set in your probe set >>>list as well. The code first removes all the probes, so when it >>>goes to remove that probe set, it won't find anything and could >>>throw an error. I don't know for sure that this is your problem, >>>but I've seen it before. >>>Jenny >>>At 02:54 PM 9/1/2010, Jack Luo wrote: >>>>Hi Jenny, >>>>I run the code through 3 different datasets (each consists of >>>>50-60 U133a cel files), the first 2 datasets run pretty smoothly, >>>>the last one has the error msg: >>>>Error in ans[[i]][, i.probes] : incorrect number of dimensions >>>>Not sure what's causing the problem, but the last one has the >>>>largest number of masked probes/probesets: ~179k masked probes >>>>(out of 247965 in total) and ~15k masked probe sets (out of 22283 >>>>in total). Does the code have a limit on the number of masked >>>>probe/probesets? or there are certain probes/probe sets that can >>>>not be filtered? >>>>Thanks a lot, >>>>-Jack >>>>On Tue, Aug 31, 2010 at 5:57 PM, Jenny Drnevich >>>><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >>>>Hi Jack, >>>>It's best to also post your question to the BioC list. First of >>>>all, you link was to the September 2006 code, but there was an >>>>update to the code in September 2008 (I think it still should work): >>>> >>>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2008-september="" 024296="" .html="">https://stat.ethz.ch/pipermail/bioconductor/2008-September/02429 6.html >>>> >>>>Did you see this more recent post on the BioC list about removing >>>>individual probes? The individual probe names have to be in the >>>>correct format, and your's aren't: >>>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2010-january="" 031463.h="" tml="">https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.ht ml >>>> >>>>HTH, >>>>Jenny >>>>At 11:46 AM 8/31/2010, you wrote: >>>>>Hi Jenny, >>>>>I am a BioC user and find the following link: >>>>><https: stat.ethz.ch="" pipermail="" bioconductor="" 2006-september="" 01424="" 2.html="">https://stat.ethz.ch/pipermail/bioconductor/2006-September/0142 42.html >>>>> >>>>>which is related to removing probes/probe sets from the cdf. >>>>>It's highly related to my work, which needs to perform >>>>>normalization after filtering out some probe pairs whose PM is >>>>>not significantly higher than MM. >>>>>I played around with the code in the above link by following the >>>>>instruction, it works perfectly when some given probe sets need >>>>>to be taken out, but have some problem when some probes (not >>>>>whole probe sets) need to be taken out. I am wondering do you >>>>>have an example for taking some probes as well as probe sets? I >>>>>have not figured out what's the correct input for probes. >>>>>I tried the following inputs and neither of them works. >>>>>maskedprobeSets = rownames(exprs(justRMA(filenames = >>>>>celfiles)))[1:100] # this format works >>>>>maskedprobes = rownames(pm(AffyBatch))[1:500] # doesn't work >>>>>maskedprobes = as.character(1:500) # doesn't work >>>>>RemoveProbes(listOutProbes=maskedprobes, >>>>>listOutProbeSets=maskedprobeSets, cleancdf) >>>>>Thanks a lot! Your help will be highly appreciated. >>>>>-Jack [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6