cdf vs probe package & Linux vs PC
1
0
Entering edit mode
@justin-borevitz-327
Last seen 10.2 years ago
I've noticed that the order of probes in the 2 packages does not agree. At least for barley1 and ath1121501. Also the way probes are ordered in Linux and Rgui (PC) does not agree. It could be something with the alphabetizing of probsets names in the 2 versions. Its possible this is true for the probe package coming from Affymetrix as well, which doesn't match either Linux or PC ordering. Lesson never assume ordering... Maybe everyone knows this already and that is the purpose of matchprobes?? Any help with simple calls to avoid this problem are appreciated. # in Linux barley.object <- read.affybatch(filenames = list.celfiles()[2]) Warning message: Incompatible phenoData object. Created a new one. in: read.affybatch(filenames = list.celfiles()[2]) pnL <- rownames(pm(barley.object)) save(pnL,file = "pnL.RData",compress=T) ## then download from linux to PC # On PC barley.object <- read.affybatch(filenames = list.celfiles()[2]) Warning message: Incompatible phenoData object. Created a new one. in: read.affybatch(filenames = list.celfiles()[2]) pnPC <- rownames(pm(barley.object)) load("D:/barley/pnL.RData") table(pnPC == pnL) FALSE TRUE 172752 78685 #Observation and rough fix for probe package ordering to PC ordering setwd("d:/barley") library(affy) barley.object <- read.affybatch(filenames = list.celfiles()) probesets <- rownames(pm(barley.object)) length(probesets) library(barley1probe) length(barley1probe$Probe.Set.Names) psn <- gsub("_at[0-9]","_at",probesets) psn <- gsub("_at[0-9]","_at",psn) table(psn == barley1probe$Probe.Set.Name) # FALSE TRUE #249955 1482 setwd("d:/ath1") ath1.obj <- read.affybatch(filenames = list.celfiles()[1]) aprobesets <- rownames(pm(ath1.obj)) apsn <- gsub("_at[0-9]","_at",aprobesets) apsn <- gsub("_at[0-9]","_at",apsn) library(ath1121501probe) table(apsn == ath1121501probe$Probe.Set.Name) # FALSE TRUE # 439 250639 Using the x and y coords I'm reordered the probe package as follows... setwd("d:/barley") library(affy) barley.object <- read.affybatch(filenames = list.celfiles()) pm.i <- indexProbes(barley.object, which="pm") # all genes pm1 <- unlist(pm.i) pm.i.xy <- matrix(indices2xy(pm1, abatch = barley.object),nc = 2) length(pm1) dim(pm.i.xy) pm.i.xy <- pm.i.xy - 1 # for affy units starting at 0. probesets <- rownames(pm(barley.object)) length(probesets) # now match with xy in barley1probe.. cdfxy <- paste(pm.i.xy[,1],pm.i.xy[,2]) library(barley1probe) names(barley1probe) probexy <- paste(barley1probe$x,barley1probe$y) ordcdf <- match(cdfxy,probexy) psn <- gsub("_at[0-9]","_at",probesets) psn <- gsub("_at[0-9]","_at",psn) table(psn == barley1probe$Probe.Set.Name) # FALSE TRUE #250100 1337 table(psn == barley1probe$Probe.Set.Name[ordcdf]) # TRUE #251437 barley1probe <- barley1probe[ordcdf, ] save(barley1probe,file = "barley1probe.RData", compress=T) setwd("d:/ath1") ath1.obj <- read.affybatch(filenames = list.celfiles()[1]) aprobesets <- rownames(pm(ath1.obj)) apsn <- gsub("_at[0-9]","_at",aprobesets) apsn <- gsub("_at[0-9]","_at",apsn) apsn <- gsub("_at[0-9]","_at",apsn) pm.i <- indexProbes(ath1.obj, which="pm") # all genes pm1 <- unlist(pm.i) pm.i.xy <- matrix(indices2xy(pm1, abatch = ath1.obj),nc = 2) length(pm1) dim(pm.i.xy) pm.i.xy <- pm.i.xy - 1 # for affy units starting at 0. # now match with xy in ath1121501probe.. cdfxy <- paste(pm.i.xy[,1],pm.i.xy[,2]) library(ath1121501probe) probexy <- paste(ath1121501probe$x,ath1121501probe$y) ordcdf <- match(cdfxy,probexy) table(apsn == ath1121501probe$Probe.Set.Name) # FALSE TRUE # 439 250639 table(apsn == ath1121501probe$Probe.Set.Name[ordcdf]) # TRUE #251078 ath1121501probe <- ath1121501probe[ordcdf, ] save(ath1121501probe,file = "ath1121501probe.RData", compress=T) --- Justin Borevitz Plant Biology Salk Institute 10010 N. Torrey Pines Rd. La Jolla CA, 92037 USA ph. 858 453-4100X1796 fax 858 452-4315 mailto:borevitz@salk.edu http://naturalvariation.org
ath1121501 cdf probe affy ath1121501 cdf probe affy • 910 views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 12 weeks ago
EMBL European Molecular Biology Laborat…
Hi Justin, an AffyBatch contains data on all those probes that were found in the CEL file. The probe package contains data on all probes for which we could get sequence information from Affymetrix. The two don't necessarily overlap and also nowhere in the code it is assumed that they are in the same, or in any particular order. What I usually to is locate the row(s) that correspond to the probe(s) that I want to look at in the probe package, then use probepackage::xy2i to find them in the AffyBatch. Please let me/us know if you have more specific questions or suggestions on how to improve the user interface for accessing the data from individual probes. (If you know enough R, feel to look at the methods directly > showMethods('pm', classes='AffyBatch', inc=TRUE) > showMethods('probes', classes='AffyBatch', inc=TRUE) to see 'below the hood'. But of course that should not be necessary for the normal user.) Best wishes Wolfgang ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/abt0840/whuber ------------------------------------- On Mon, 10 May 2004, Justin Borevitz wrote: > I've noticed that the order of probes in the 2 packages does not agree. At > least for barley1 and ath1121501. Also the way probes are ordered in Linux > and Rgui (PC) does not agree. It could be something with the alphabetizing > of probsets names in the 2 versions. Its possible this is true for the > probe package coming from Affymetrix as well, which doesn't match either > Linux or PC ordering. Lesson never assume ordering... > > Maybe everyone knows this already and that is the purpose of matchprobes?? > Any help with simple calls to avoid this problem are appreciated. > > # in Linux > barley.object <- read.affybatch(filenames = list.celfiles()[2]) > Warning message: > Incompatible phenoData object. Created a new one. > in: read.affybatch(filenames = list.celfiles()[2]) > pnL <- rownames(pm(barley.object)) > save(pnL,file = "pnL.RData",compress=T) > ## then download from linux to PC > > # On PC > barley.object <- read.affybatch(filenames = list.celfiles()[2]) > Warning message: > Incompatible phenoData object. Created a new one. > in: read.affybatch(filenames = list.celfiles()[2]) > pnPC <- rownames(pm(barley.object)) > > load("D:/barley/pnL.RData") > > table(pnPC == pnL) > FALSE TRUE > 172752 78685 > > > > #Observation and rough fix for probe package ordering to PC ordering > > setwd("d:/barley") > library(affy) > barley.object <- read.affybatch(filenames = list.celfiles()) > probesets <- rownames(pm(barley.object)) > length(probesets) > > library(barley1probe) > length(barley1probe$Probe.Set.Names) > > psn <- gsub("_at[0-9]","_at",probesets) > psn <- gsub("_at[0-9]","_at",psn) > table(psn == barley1probe$Probe.Set.Name) > # FALSE TRUE > #249955 1482 > > setwd("d:/ath1") > ath1.obj <- read.affybatch(filenames = list.celfiles()[1]) > aprobesets <- rownames(pm(ath1.obj)) > apsn <- gsub("_at[0-9]","_at",aprobesets) > apsn <- gsub("_at[0-9]","_at",apsn) > library(ath1121501probe) > table(apsn == ath1121501probe$Probe.Set.Name) > # FALSE TRUE > # 439 250639 > > Using the x and y coords I'm reordered the probe package as follows... > > setwd("d:/barley") > library(affy) > barley.object <- read.affybatch(filenames = list.celfiles()) > pm.i <- indexProbes(barley.object, which="pm") # all genes > pm1 <- unlist(pm.i) > pm.i.xy <- matrix(indices2xy(pm1, abatch = barley.object),nc = 2) > length(pm1) > dim(pm.i.xy) > pm.i.xy <- pm.i.xy - 1 # for affy units starting at 0. > probesets <- rownames(pm(barley.object)) > length(probesets) > # now match with xy in barley1probe.. > cdfxy <- paste(pm.i.xy[,1],pm.i.xy[,2]) > > library(barley1probe) > names(barley1probe) > probexy <- paste(barley1probe$x,barley1probe$y) > ordcdf <- match(cdfxy,probexy) > psn <- gsub("_at[0-9]","_at",probesets) > psn <- gsub("_at[0-9]","_at",psn) > table(psn == barley1probe$Probe.Set.Name) > # FALSE TRUE > #250100 1337 > table(psn == barley1probe$Probe.Set.Name[ordcdf]) > # TRUE > #251437 > > barley1probe <- barley1probe[ordcdf, ] > save(barley1probe,file = "barley1probe.RData", compress=T) > > > > > setwd("d:/ath1") > ath1.obj <- read.affybatch(filenames = list.celfiles()[1]) > aprobesets <- rownames(pm(ath1.obj)) > apsn <- gsub("_at[0-9]","_at",aprobesets) > apsn <- gsub("_at[0-9]","_at",apsn) > apsn <- gsub("_at[0-9]","_at",apsn) > pm.i <- indexProbes(ath1.obj, which="pm") # all genes > pm1 <- unlist(pm.i) > pm.i.xy <- matrix(indices2xy(pm1, abatch = ath1.obj),nc = 2) > length(pm1) > dim(pm.i.xy) > pm.i.xy <- pm.i.xy - 1 # for affy units starting at 0. > # now match with xy in ath1121501probe.. > cdfxy <- paste(pm.i.xy[,1],pm.i.xy[,2]) > > library(ath1121501probe) > probexy <- paste(ath1121501probe$x,ath1121501probe$y) > ordcdf <- match(cdfxy,probexy) > > table(apsn == ath1121501probe$Probe.Set.Name) > # FALSE TRUE > # 439 250639 > table(apsn == ath1121501probe$Probe.Set.Name[ordcdf]) > # TRUE > #251078 > ath1121501probe <- ath1121501probe[ordcdf, ] > save(ath1121501probe,file = "ath1121501probe.RData", compress=T) > > > > > --- > Justin Borevitz > > Plant Biology > Salk Institute > 10010 N. Torrey Pines Rd. > La Jolla CA, 92037 > USA > ph. 858 453-4100X1796 > fax 858 452-4315 > mailto:borevitz@salk.edu > http://naturalvariation.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD COMMENT

Login before adding your answer.

Traffic: 481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6