Question: altcdfenvs
0
15.0 years ago by
Hee Siew Wan60
Hee Siew Wan60 wrote:
Dear All I was trying to use a trial data (Dilution) to create a new cdf using "altcdfenvs". Instead of using "matchprobes", I created the "m": ind <- c(seq(1,199084,by=11), seq(1,199084,by=10), seq(1,199084,by=9), seq(1,199084,by=8), seq(1,199084,by=7), seq(1,199084,by=6)) m.dil <- new.env() m.dil$match <- list(ind[1]) m.dil$match <- c(m.dil$match, ind[2:length(ind)]) m.dil <- as.list(m.dil) length(m.dil$match) # [1] 146637 id.dil <- hgu95av2probe$Probe.Set.Name[ind] dil.cdf <- buildCdfEnv.matchprobes(m.dil, id.dil, nrow.chip=640, ncol.chip=640, chiptype="HG-U95Av2", probes.pack="hgu95av2probe") new.dil <- Dilution[,1:2] validAffyBatch(new.dil, dil.cdf) # [1] TRUE new.dil.cdfenv <- dil.cdf@envir <mailto:dil.cdf@envir> new.dil@cdfName <mailto:new.dil@cdfname> <- "new.dil.cdfenv" > new.dil AffyBatch object size of arrays=640x640 features (6405 kb) cdf=new.dil.cdfenv (12453 affyids) number of samples=2 number of genes=12453 annotation=hgu95av2 > length(pm(new.dil[,1])) [1] 12453 As noted above, I have 12453 probe sets with my new cdf but I also have 12453 probe pairs when in fact I want 146637 probe pairs. The new cdf only returns 1 probe pair per set. Is there a way where I can have the 146637 probe pairs? I tried doing the same thing for ath1121501 array. For this case, I created a data.frame from "ath1121501probe" with the following columns: > names(newath) [1] "sequence" "probe" "X" "Y" "position" However, when I run m <- matchprobes(newath$sequence, ath1121501probe$sequence) I found out that for some sequences, I have more than 1 match. For example, > ath1121501probe$sequence[16023] [1] "GAGTATGCAGTCGAGTGGTGTGATG" > ath1121501probe$sequence[16012] [1] "GAGTATGCAGTCGAGTGGTGTGATG" Hence, the probe that I'm interested in may not be matched to the correct one. The versions I'm using: R: 1.9.0 altcdfenvs: 1.0.0 affy: 1.4.31 ath1121501probe: 1.01 on Windows XP Professional Version 2002. Did I do something wrong along the way for both methods? I'd appreciate any help or advice regarding how to get the selected probe pairs for analysis. Also, how do I cite the package "altcdfenvs"? Thank you. Regards Hee, Siew Wan ath1121501 cdf probe • 416 views ADD COMMENTlink modified 15.0 years ago by Holger Schwender900 • written 15.0 years ago by Hee Siew Wan60 Answer: altcdfenvs 0 15.0 years ago by lgautier@altern.org950 wrote: Hee Siew Wan wrote: > Dear All > > I was trying to use a trial data (Dilution) to create a new cdf using "altcdfenvs". Instead of using "matchprobes", I created the "m": ...let's see how 'the "m"' was made then... > ind <- c(seq(1,199084,by=11), seq(1,199084,by=10), seq(1,199084,by=9), > seq(1,199084,by=8), seq(1,199084,by=7), seq(1,199084,by=6)) > > m.dil <- new.env() > m.dil$match <- list(ind[1]) > m.dil$match <- c(m.dil$match, ind[2:length(ind)]) > m.dil <- as.list(m.dil) > length(m.dil$match) # [1] 146637 > > id.dil <- hgu95av2probe$Probe.Set.Name[ind] > > dil.cdf <- buildCdfEnv.matchprobes(m.dil, id.dil, nrow.chip=640, ncol.chip=640, > chiptype="HG-U95Av2", probes.pack="hgu95av2probe") > > new.dil <- Dilution[,1:2] > validAffyBatch(new.dil, dil.cdf) # [1] TRUE > new.dil.cdfenv <- dil.cdf@envir <mailto:dil.cdf@envir> > new.dil@cdfName <mailto:new.dil@cdfname> <- "new.dil.cdfenv" > > >>new.dil > > AffyBatch object > size of arrays=640x640 features (6405 kb) > cdf=new.dil.cdfenv (12453 affyids) > number of samples=2 > number of genes=12453 > annotation=hgu95av2 > > >>length(pm(new.dil[,1])) > > [1] 12453 > > As noted above, I have 12453 probe sets with my new cdf but I also have 12453 probe pairs when in fact I want 146637 probe pairs. The new cdf only returns 1 probe pair per set. Is there a way where I can have the 146637 probe pairs? ...then you may want to actually provide enough _identifiers_ (i.e., unique strings) to achieve this. On my side, having made the variable 'id.dil' the way you did, I have: > length(unique(id.dil)) [1] 12453 (I did not anticipate this could be a 'gotcha'; a warning will be added to 'buildCdfEnv.matchprobes') > I tried doing the same thing for ath1121501 array. For this case, I created a data.frame from "ath1121501probe" with the following columns: > >>names(newath) > > [1] "sequence" "probe" "X" "Y" "position" > > However, when I run > > m <- matchprobes(newath$sequence, ath1121501probe$sequence) > > I found out that for some sequences, I have more than 1 match. For example, > > >>ath1121501probe$sequence[16023] > > [1] "GAGTATGCAGTCGAGTGGTGTGATG" > >>ath1121501probe$sequence[16012] > > [1] "GAGTATGCAGTCGAGTGGTGTGATG" > > Hence, the probe that I'm interested in may not be matched to the correct one. ...I am not certain to follow completely what you mean... > The versions I'm using: > R: 1.9.0 > altcdfenvs: 1.0.0 > affy: 1.4.31 > ath1121501probe: 1.01 You may want to upgrade to a more recent version of R and of the packages. Hoping it helps, L. > on Windows XP Professional Version 2002. > > Did I do something wrong along the way for both methods? I'd appreciate any help or advice regarding how to get the selected probe pairs for analysis. Also, how do I cite the package "altcdfenvs"? Thank you. > > Regards > Hee, Siew Wan > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >
0
15.0 years ago by
Hee Siew Wan60
Hee Siew Wan60 wrote:
Thanks Laurent for the tip but I encountered other problems if I create enough identifiers. When I created one unique identifier for each probe pair I want to be inside the new cdf, then I would get 99112 probe *sets* because > length(unique(ind)) [1] 99112 This isn't exactly what I have in mind. If say, I want to have 4 probe pairs (nearest to the 5-prime end) from each set, how can I proceed to create this new cdf? What I realised from what I've done below is that I will get one probe pair that's furthest from 5-prime end for each set because the furthest pair is at the *bottom* of the probe set. The probe table is arranged in increasing order and so it seems to me that it updates itself and did not keep the earlier ones. Please advice and thanks for the help. Cheers sw -----Original Message----- From: Laurent Gautier [mailto:lgautier@altern.org] Sent: Mon 08-Nov-04 1:10 PM To: Hee Siew Wan Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] altcdfenvs Hee Siew Wan wrote: > Dear All > > I was trying to use a trial data (Dilution) to create a new cdf using "altcdfenvs". Instead of using "matchprobes", I created the "m": ...let's see how 'the "m"' was made then... > ind <- c(seq(1,199084,by=11), seq(1,199084,by=10), seq(1,199084,by=9), > seq(1,199084,by=8), seq(1,199084,by=7), seq(1,199084,by=6)) > > m.dil <- new.env() > m.dil$match <- list(ind[1]) > m.dil$match <- c(m.dil$match, ind[2:length(ind)]) > m.dil <- as.list(m.dil) > length(m.dil$match) # [1] 146637 > > id.dil <- hgu95av2probe$Probe.Set.Name[ind] > > dil.cdf <- buildCdfEnv.matchprobes(m.dil, id.dil, nrow.chip=640, ncol.chip=640, > chiptype="HG-U95Av2", probes.pack="hgu95av2probe") > > new.dil <- Dilution[,1:2] > validAffyBatch(new.dil, dil.cdf) # [1] TRUE > new.dil.cdfenv <- dil.cdf@envir <mailto:dil.cdf@envir> > new.dil@cdfName <mailto:new.dil@cdfname> <- "new.dil.cdfenv" > > >>new.dil > > AffyBatch object > size of arrays=640x640 features (6405 kb) > cdf=new.dil.cdfenv (12453 affyids) > number of samples=2 > number of genes=12453 > annotation=hgu95av2 > > >>length(pm(new.dil[,1])) > > [1] 12453 > > As noted above, I have 12453 probe sets with my new cdf but I also have 12453 probe pairs when in fact I want 146637 probe pairs. The new cdf only returns 1 probe pair per set. Is there a way where I can have the 146637 probe pairs? ...then you may want to actually provide enough _identifiers_ (i.e., unique strings) to achieve this. On my side, having made the variable 'id.dil' the way you did, I have: > length(unique(id.dil)) [1] 12453 (I did not anticipate this could be a 'gotcha'; a warning will be added to 'buildCdfEnv.matchprobes') > I tried doing the same thing for ath1121501 array. For this case, I created a data.frame from "ath1121501probe" with the following columns: > >>names(newath) > > [1] "sequence" "probe" "X" "Y" "position" > > However, when I run > > m <- matchprobes(newath$sequence, ath1121501probe$sequence) > > I found out that for some sequences, I have more than 1 match. For example, > > >>ath1121501probe$sequence[16023] > > [1] "GAGTATGCAGTCGAGTGGTGTGATG" > >>ath1121501probe$sequence[16012] > > [1] "GAGTATGCAGTCGAGTGGTGTGATG" > > Hence, the probe that I'm interested in may not be matched to the correct one. ...I am not certain to follow completely what you mean... > The versions I'm using: > R: 1.9.0 > altcdfenvs: 1.0.0 > affy: 1.4.31 > ath1121501probe: 1.01 You may want to upgrade to a more recent version of R and of the packages. Hoping it helps, L. > on Windows XP Professional Version 2002. > > Did I do something wrong along the way for both methods? I'd appreciate any help or advice regarding how to get the selected probe pairs for analysis. Also, how do I cite the package "altcdfenvs"? Thank you. > > Regards > Hee, Siew Wan > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > ADD COMMENTlink written 15.0 years ago by Hee Siew Wan60 Hee Siew Wan wrote: > Thanks Laurent for the tip but I encountered other problems if I > create enough identifiers. When I created one unique identifier for > each probe pair I want to be inside the new cdf, then I would get > 99112 probe *sets* because You need one identifier per probe set, not for each probe pair !!! A probe set is made of probe pairs, and a probe pair is made of two probes (the PM and its corresponding MM). Did you actually follow the main vignette for the package ? The example uses objects of small size to let one experiment and understand. If you run it and take the objects 'm', and 'ids': > length(m$match) [1] 3 > length(ids) [1] 3 > m$match [[1]] [1] 77288 77289 77290 77291 77292 77293 77294 77295 77296 77297 77298 77299 [13] 77300 77301 77302 77303 77304 77305 77306 77307 77308 77309 [[2]] [1] 100047 100048 100049 100050 100051 100052 100053 100054 100055 100056 [11] 100057 100058 100059 100060 100061 100062 100063 100064 100065 100066 [21] 100067 100068 122174 [[3]] [1] 34732 34733 34734 34735 34736 34737 34738 34739 34740 34741 [11] 34742 122168 122169 122170 122171 122172 122174 122175 122176 122177 [21] 122178 > alt.cdf <- buildCdfEnv.matchprobes(m, ids, nrow.chip = 712, ncol.chip = 712, chiptype = "HG-U133A", probes.pack = "hgu133aprobe") > alt.cdf Instance of class CdfEnvAffy: name : HG-U133A chip-type: HG-U133A size : 712 x 712 3 probe set(s) defined. > >> length(unique(ind)) > > [1] 99112 > > This isn't exactly what I have in mind. If say, I want to have 4 > probe pairs (nearest to the 5-prime end) from each set, how can I > proceed to create this new cdf? To be 'near the 5-prime end', you will need to a) have reference sequences and match the probes against them, or b) get that information from somewhere (NetAffx may be). If I were you, I would not take risk with the ordering in the probe sequence table. > What I realised from what I've done below is that I will get one > probe pair that's furthest from 5-prime end for each set because the > furthest pair is at the *bottom* of the probe set. The probe table is > arranged in increasing order and so it seems to me that it updates > itself and did not keep the earlier ones. I am sorry, but I am really confused... You may also have to explain more about what you have in mind (off- list, if you wish) if you want someone to help. L. > Please advice and thanks for the help. > > Cheers sw > ADD REPLYlink written 15.0 years ago by lgautier@altern.org950 Answer: altcdfenvs 0 15.0 years ago by Holger Schwender900 wrote: Hi, I have written some functions for making your own cdf environment for one of my collegues who is interested in this. In one function, you have to input a list. Each element of this list has to be a gene, and each of this elements must consist of a vector of the perfect match IDs you are interested in. So this list should look something like this$ Gene1 [1] 12123 1412414 12231 4421233 $Gene2 [1] 342352 12312 1234112 412211 and so on, where 12123, 1412414, ... are the PM IDs. Using this list as your argument you will get a cdf environment that contains only the probe sets specified in this list, and only the probe pairs of the probe sets in this list that correspond to the PM IDs (you get both PMs and MMs corresponding to the PM ID). Would this function solve your problem? Best, Holger > Thanks Laurent for the tip but I encountered other problems if I create > enough identifiers. When I created one unique identifier for each probe pair > I want to be inside the new cdf, then I would get 99112 probe *sets* > because > > length(unique(ind)) > [1] 99112 > > This isn't exactly what I have in mind. If say, I want to have 4 probe > pairs (nearest to the 5-prime end) from each set, how can I proceed to create > this new cdf? > > What I realised from what I've done below is that I will get one probe > pair that's furthest from 5-prime end for each set because the furthest pair > is at the *bottom* of the probe set. The probe table is arranged in > increasing order and so it seems to me that it updates itself and did not keep the > earlier ones. > > Please advice and thanks for the help. > > Cheers > sw > > -----Original Message----- > From: Laurent Gautier [mailto:lgautier@altern.org] > Sent: Mon 08-Nov-04 1:10 PM > To: Hee Siew Wan > Cc: bioconductor@stat.math.ethz.ch > Subject: Re: [BioC] altcdfenvs > > > > Hee Siew Wan wrote: > > Dear All > > > > I was trying to use a trial data (Dilution) to create a new cdf using > "altcdfenvs". Instead of using "matchprobes", I created the "m": > > ...let's see how 'the "m"' was made then... > > > ind <- c(seq(1,199084,by=11), seq(1,199084,by=10), seq(1,199084,by=9), > > seq(1,199084,by=8), seq(1,199084,by=7), seq(1,199084,by=6)) > > > > m.dil <- new.env() > > m.dil$match <- list(ind[1]) > > m.dil$match <- c(m.dil$match, ind[2:length(ind)]) > > m.dil <- as.list(m.dil) > > length(m.dil$match) # [1] 146637 > > > > id.dil <- hgu95av2probe$Probe.Set.Name[ind] > > > > dil.cdf <- buildCdfEnv.matchprobes(m.dil, id.dil, nrow.chip=640, > ncol.chip=640, > > chiptype="HG-U95Av2", probes.pack="hgu95av2probe") > > > > new.dil <- Dilution[,1:2] > > validAffyBatch(new.dil, dil.cdf) # [1] TRUE > > new.dil.cdfenv <- dil.cdf@envir <mailto:dil.cdf@envir> > > new.dil@cdfName <mailto:new.dil@cdfname> <- "new.dil.cdfenv" > > > > > >>new.dil > > > > AffyBatch object > > size of arrays=640x640 features (6405 kb) > > cdf=new.dil.cdfenv (12453 affyids) > > number of samples=2 > > number of genes=12453 > > annotation=hgu95av2 > > > > > >>length(pm(new.dil[,1])) > > > > [1] 12453 > > > > As noted above, I have 12453 probe sets with my new cdf but I also have > 12453 probe pairs when in fact I want 146637 probe pairs. The new cdf only > returns 1 probe pair per set. Is there a way where I can have the 146637 > probe pairs? > > ...then you may want to actually provide enough _identifiers_ (i.e., > unique strings) to achieve this. On my side, having made the variable > 'id.dil' the way you did, I have: > > length(unique(id.dil)) > [1] 12453 > > (I did not anticipate this could be a 'gotcha'; a warning will be added > to 'buildCdfEnv.matchprobes') > > > > I tried doing the same thing for ath1121501 array. For this case, I > created a data.frame from "ath1121501probe" with the following columns: > > > >>names(newath) > > > > [1] "sequence" "probe" "X" "Y" "position" > > > > However, when I run > > > > m <- matchprobes(newath$sequence, ath1121501probe$sequence) > > > > I found out that for some sequences, I have more than 1 match. For > example, > > > > > >>ath1121501probe$sequence[16023] > > > > [1] "GAGTATGCAGTCGAGTGGTGTGATG" > > > >>ath1121501probe$sequence[16012] > > > > [1] "GAGTATGCAGTCGAGTGGTGTGATG" > > > > Hence, the probe that I'm interested in may not be matched to the > correct one. > > > ...I am not certain to follow completely what you mean... > > > > The versions I'm using: > > R: 1.9.0 > > altcdfenvs: 1.0.0 > > affy: 1.4.31 > > ath1121501probe: 1.01 > > You may want to upgrade to a more recent version of R and of the > packages. > > > > Hoping it helps, > > > L. > > > > on Windows XP Professional Version 2002. > > > > Did I do something wrong along the way for both methods? I'd appreciate > any help or advice regarding how to get the selected probe pairs for > analysis. Also, how do I cite the package "altcdfenvs"? Thank you. > > > > Regards > > Hee, Siew Wan > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Geschenkt: 3 Monate GMX ProMail + 3 Top-Spielfilme auf DVD ++ Jetzt kostenlos testen http://www.gmx.net/de/go/mail ++