Difference between number of probes and number of data rows using 'oligo' on Affy miRNA v3.0 arrays
1
0
Entering edit mode
Vicky Fan ▴ 10
@vicky-fan-5789
Last seen 9.6 years ago
Dear all, I am using the 'oligo' package to process data from Affymetrix miRNA v3.0 arrays. When I extract the probe names as follows, I get 243982 probes: > library(oligo) > celFiles <- list.celfiles() > rawData <- read.celfiles(celFiles) > pNames <- probeNames(rawData) > exprs.rawData <- exprs(rawData) However, extracting the data itself gives me a different number of rows: > length(pNames) [1] 243982 > dim(exprs.rawData) [1] 292681 6 I’ve verified that this result occurs using the sample CEL files from the Affymetrix website here (although there is a login required): http://www.affymetrix.com/Auth/support/downloads/demo_data/mirna_3_sam ple_data.zip Shouldn’t the number of probes in the CEL file be the same as the number of rows in the dataset? I’m aware that the exprs function is for objects of type eSet and that read.celfiles returns an ExpressionFeatureSet object, not an eSet object, so maybe this has something to do with the non-matching numbers. Regards, Vicky -- Vicky Fan Research Programmer Bioinformatics Institute School of Biological Sciences University of Auckland Ph: 09 373 7599 x 83777 [[alternative HTML version deleted]]
probe PROcess probe PROcess • 912 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States
Hi Vicky, On 2/24/2013 8:03 PM, Vicky Fan wrote: > Dear all, > I am using the 'oligo' package to process data from Affymetrix miRNA v3.0 arrays. When I extract the probe names as follows, I get 243982 probes: > > >> library(oligo) >> celFiles<- list.celfiles() >> rawData<- read.celfiles(celFiles)nn >> pNames<- probeNames(rawData) >> exprs.rawData<- exprs(rawData) > > > However, extracting the data itself gives me a different number of rows: > > > >> length(pNames) > [1] 243982 > >> dim(exprs.rawData) > [1] 292681 6 > > I?ve verified that this result occurs using the sample CEL files from the Affymetrix website here (although there is a login required): > > http://www.affymetrix.com/Auth/support/downloads/demo_data/mirna_3_s ample_data.zip > > Shouldn?t the number of probes in the CEL file be the same as the number of rows in the dataset? I?m aware that the exprs function is for objects of type eSet and that read.celfiles returns an ExpressionFeatureSet object, not an eSet object, so maybe this has something to do with the non-matching numbers. There are a large number of probes around the perimeter of the array (as well as some blocks of probes in the middle) that are primarily used for aligning the scanner to the array. Since these probes don't measure anything of interest (it's oligo-dT), they are not used in any further calculations. The difference here is due to the fact that all probes are scanned by the scanner, and those data are available in the celfile, so the dimensions of the raw data will reflect the existence of these extra probes. But since these probes aren't used for anything else, so when you extract the probe names, those data only reflect the number of probes on the array that are intended to measure various transcripts. Best, Jim > > Regards, > Vicky > > -- > Vicky Fan > Research Programmer > Bioinformatics Institute > School of Biological Sciences > University of Auckland > Ph: 09 373 7599 x 83777 > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6