Question: cdf vs probe package & Linux vs PC
0
gravatar for Justin Borevitz
15.4 years ago by
Justin Borevitz90 wrote:
I've noticed that the order of probes in the 2 packages does not agree. At least for barley1 and ath1121501. Also the way probes are ordered in Linux and Rgui (PC) does not agree. It could be something with the alphabetizing of probsets names in the 2 versions. Its possible this is true for the probe package coming from Affymetrix as well, which doesn't match either Linux or PC ordering. Lesson never assume ordering... Maybe everyone knows this already and that is the purpose of matchprobes?? Any help with simple calls to avoid this problem are appreciated. # in Linux barley.object <- read.affybatch(filenames = list.celfiles()[2]) Warning message: Incompatible phenoData object. Created a new one. in: read.affybatch(filenames = list.celfiles()[2]) pnL <- rownames(pm(barley.object)) save(pnL,file = "pnL.RData",compress=T) ## then download from linux to PC # On PC barley.object <- read.affybatch(filenames = list.celfiles()[2]) Warning message: Incompatible phenoData object. Created a new one. in: read.affybatch(filenames = list.celfiles()[2]) pnPC <- rownames(pm(barley.object)) load("D:/barley/pnL.RData") table(pnPC == pnL) FALSE TRUE 172752 78685 #Observation and rough fix for probe package ordering to PC ordering setwd("d:/barley") library(affy) barley.object <- read.affybatch(filenames = list.celfiles()) probesets <- rownames(pm(barley.object)) length(probesets) library(barley1probe) length(barley1probe$Probe.Set.Names) psn <- gsub("_at[0-9]","_at",probesets) psn <- gsub("_at[0-9]","_at",psn) table(psn == barley1probe$Probe.Set.Name) # FALSE TRUE #249955 1482 setwd("d:/ath1") ath1.obj <- read.affybatch(filenames = list.celfiles()[1]) aprobesets <- rownames(pm(ath1.obj)) apsn <- gsub("_at[0-9]","_at",aprobesets) apsn <- gsub("_at[0-9]","_at",apsn) library(ath1121501probe) table(apsn == ath1121501probe$Probe.Set.Name) # FALSE TRUE # 439 250639 Using the x and y coords I'm reordered the probe package as follows... setwd("d:/barley") library(affy) barley.object <- read.affybatch(filenames = list.celfiles()) pm.i <- indexProbes(barley.object, which="pm") # all genes pm1 <- unlist(pm.i) pm.i.xy <- matrix(indices2xy(pm1, abatch = barley.object),nc = 2) length(pm1) dim(pm.i.xy) pm.i.xy <- pm.i.xy - 1 # for affy units starting at 0. probesets <- rownames(pm(barley.object)) length(probesets) # now match with xy in barley1probe.. cdfxy <- paste(pm.i.xy[,1],pm.i.xy[,2]) library(barley1probe) names(barley1probe) probexy <- paste(barley1probe$x,barley1probe$y) ordcdf <- match(cdfxy,probexy) psn <- gsub("_at[0-9]","_at",probesets) psn <- gsub("_at[0-9]","_at",psn) table(psn == barley1probe$Probe.Set.Name) # FALSE TRUE #250100 1337 table(psn == barley1probe$Probe.Set.Name[ordcdf]) # TRUE #251437 barley1probe <- barley1probe[ordcdf, ] save(barley1probe,file = "barley1probe.RData", compress=T) setwd("d:/ath1") ath1.obj <- read.affybatch(filenames = list.celfiles()[1]) aprobesets <- rownames(pm(ath1.obj)) apsn <- gsub("_at[0-9]","_at",aprobesets) apsn <- gsub("_at[0-9]","_at",apsn) apsn <- gsub("_at[0-9]","_at",apsn) pm.i <- indexProbes(ath1.obj, which="pm") # all genes pm1 <- unlist(pm.i) pm.i.xy <- matrix(indices2xy(pm1, abatch = ath1.obj),nc = 2) length(pm1) dim(pm.i.xy) pm.i.xy <- pm.i.xy - 1 # for affy units starting at 0. # now match with xy in ath1121501probe.. cdfxy <- paste(pm.i.xy[,1],pm.i.xy[,2]) library(ath1121501probe) probexy <- paste(ath1121501probe$x,ath1121501probe$y) ordcdf <- match(cdfxy,probexy) table(apsn == ath1121501probe$Probe.Set.Name) # FALSE TRUE # 439 250639 table(apsn == ath1121501probe$Probe.Set.Name[ordcdf]) # TRUE #251078 ath1121501probe <- ath1121501probe[ordcdf, ] save(ath1121501probe,file = "ath1121501probe.RData", compress=T) --- Justin Borevitz Plant Biology Salk Institute 10010 N. Torrey Pines Rd. La Jolla CA, 92037 USA ph. 858 453-4100X1796 fax 858 452-4315 mailto:borevitz@salk.edu http://naturalvariation.org
ath1121501 cdf probe affy • 464 views
ADD COMMENTlink modified 15.4 years ago by Wolfgang Huber13k • written 15.4 years ago by Justin Borevitz90
Answer: cdf vs probe package & Linux vs PC
0
gravatar for Wolfgang Huber
15.4 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:
Hi Justin, an AffyBatch contains data on all those probes that were found in the CEL file. The probe package contains data on all probes for which we could get sequence information from Affymetrix. The two don't necessarily overlap and also nowhere in the code it is assumed that they are in the same, or in any particular order. What I usually to is locate the row(s) that correspond to the probe(s) that I want to look at in the probe package, then use probepackage::xy2i to find them in the AffyBatch. Please let me/us know if you have more specific questions or suggestions on how to improve the user interface for accessing the data from individual probes. (If you know enough R, feel to look at the methods directly > showMethods('pm', classes='AffyBatch', inc=TRUE) > showMethods('probes', classes='AffyBatch', inc=TRUE) to see 'below the hood'. But of course that should not be necessary for the normal user.) Best wishes Wolfgang ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/abt0840/whuber ------------------------------------- On Mon, 10 May 2004, Justin Borevitz wrote: > I've noticed that the order of probes in the 2 packages does not agree. At > least for barley1 and ath1121501. Also the way probes are ordered in Linux > and Rgui (PC) does not agree. It could be something with the alphabetizing > of probsets names in the 2 versions. Its possible this is true for the > probe package coming from Affymetrix as well, which doesn't match either > Linux or PC ordering. Lesson never assume ordering... > > Maybe everyone knows this already and that is the purpose of matchprobes?? > Any help with simple calls to avoid this problem are appreciated. > > # in Linux > barley.object <- read.affybatch(filenames = list.celfiles()[2]) > Warning message: > Incompatible phenoData object. Created a new one. > in: read.affybatch(filenames = list.celfiles()[2]) > pnL <- rownames(pm(barley.object)) > save(pnL,file = "pnL.RData",compress=T) > ## then download from linux to PC > > # On PC > barley.object <- read.affybatch(filenames = list.celfiles()[2]) > Warning message: > Incompatible phenoData object. Created a new one. > in: read.affybatch(filenames = list.celfiles()[2]) > pnPC <- rownames(pm(barley.object)) > > load("D:/barley/pnL.RData") > > table(pnPC == pnL) > FALSE TRUE > 172752 78685 > > > > #Observation and rough fix for probe package ordering to PC ordering > > setwd("d:/barley") > library(affy) > barley.object <- read.affybatch(filenames = list.celfiles()) > probesets <- rownames(pm(barley.object)) > length(probesets) > > library(barley1probe) > length(barley1probe$Probe.Set.Names) > > psn <- gsub("_at[0-9]","_at",probesets) > psn <- gsub("_at[0-9]","_at",psn) > table(psn == barley1probe$Probe.Set.Name) > # FALSE TRUE > #249955 1482 > > setwd("d:/ath1") > ath1.obj <- read.affybatch(filenames = list.celfiles()[1]) > aprobesets <- rownames(pm(ath1.obj)) > apsn <- gsub("_at[0-9]","_at",aprobesets) > apsn <- gsub("_at[0-9]","_at",apsn) > library(ath1121501probe) > table(apsn == ath1121501probe$Probe.Set.Name) > # FALSE TRUE > # 439 250639 > > Using the x and y coords I'm reordered the probe package as follows... > > setwd("d:/barley") > library(affy) > barley.object <- read.affybatch(filenames = list.celfiles()) > pm.i <- indexProbes(barley.object, which="pm") # all genes > pm1 <- unlist(pm.i) > pm.i.xy <- matrix(indices2xy(pm1, abatch = barley.object),nc = 2) > length(pm1) > dim(pm.i.xy) > pm.i.xy <- pm.i.xy - 1 # for affy units starting at 0. > probesets <- rownames(pm(barley.object)) > length(probesets) > # now match with xy in barley1probe.. > cdfxy <- paste(pm.i.xy[,1],pm.i.xy[,2]) > > library(barley1probe) > names(barley1probe) > probexy <- paste(barley1probe$x,barley1probe$y) > ordcdf <- match(cdfxy,probexy) > psn <- gsub("_at[0-9]","_at",probesets) > psn <- gsub("_at[0-9]","_at",psn) > table(psn == barley1probe$Probe.Set.Name) > # FALSE TRUE > #250100 1337 > table(psn == barley1probe$Probe.Set.Name[ordcdf]) > # TRUE > #251437 > > barley1probe <- barley1probe[ordcdf, ] > save(barley1probe,file = "barley1probe.RData", compress=T) > > > > > setwd("d:/ath1") > ath1.obj <- read.affybatch(filenames = list.celfiles()[1]) > aprobesets <- rownames(pm(ath1.obj)) > apsn <- gsub("_at[0-9]","_at",aprobesets) > apsn <- gsub("_at[0-9]","_at",apsn) > apsn <- gsub("_at[0-9]","_at",apsn) > pm.i <- indexProbes(ath1.obj, which="pm") # all genes > pm1 <- unlist(pm.i) > pm.i.xy <- matrix(indices2xy(pm1, abatch = ath1.obj),nc = 2) > length(pm1) > dim(pm.i.xy) > pm.i.xy <- pm.i.xy - 1 # for affy units starting at 0. > # now match with xy in ath1121501probe.. > cdfxy <- paste(pm.i.xy[,1],pm.i.xy[,2]) > > library(ath1121501probe) > probexy <- paste(ath1121501probe$x,ath1121501probe$y) > ordcdf <- match(cdfxy,probexy) > > table(apsn == ath1121501probe$Probe.Set.Name) > # FALSE TRUE > # 439 250639 > table(apsn == ath1121501probe$Probe.Set.Name[ordcdf]) > # TRUE > #251078 > ath1121501probe <- ath1121501probe[ordcdf, ] > save(ath1121501probe,file = "ath1121501probe.RData", compress=T) > > > > > --- > Justin Borevitz > > Plant Biology > Salk Institute > 10010 N. Torrey Pines Rd. > La Jolla CA, 92037 > USA > ph. 858 453-4100X1796 > fax 858 452-4315 > mailto:borevitz@salk.edu > http://naturalvariation.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD COMMENTlink written 15.4 years ago by Wolfgang Huber13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 376 users visited in the last hour