ReadAffy question
1
0
Entering edit mode
@kimpel-mark-w-727
Last seen 9.6 years ago
I work with CEL files that frequently have names assigned randomly in respect to phenotype. I create my pdata files by modifying spreadsheets with file and phenotype information already in appropriate columns. I had been assuming that it did not matter what order the filenames were in in the first column of the pdata file, that after being read in the CEL files would be matched to the appropriate row in pdata and would thus have the correct phenotype assigned. Some recent work has indicated to me that this is probably NOT the case, instead, it appears that the files are read in by filename alphanumeric order and the phenotype and sample is assigned by row order of the pdata file. This, of course, will often result in incorrect sample names and phenotypes being assigned to files. I have searched the documentation and help files for an answer to this question to no avail. How is this supposed to work? SessionInfo() Version 2.3.0 Under development (unstable) (2006-01-01 r36947) i386-pc-mingw32 attached base packages: [1] "tcltk" "splines" "tools" "methods" "stats" "graphics" [7] "grDevices" "utils" "datasets" "base" other attached packages: tkWidgets DynDoc reposTools widgetTools rat2302cdf "1.9.0" "1.9.0" "1.9.1" "1.7.0" "1.5.1" affycoretools GOstats multtest genefilter survival "1.3.1" "1.5.4" "1.8.0" "1.9.2" "2.20" xtable RBGL annotate GO graph "1.3-0" "1.7.6" "1.8.0" "1.6.5" "1.9.4" Ruuid cluster limma affy Biobase "1.9.0" "1.10.2" "2.4.4" "1.9.6" "1.9.2" RWinEdt "1.7-3" Mark W. Kimpel I.U. School of Medicine
GO Ruuid DynDoc annotate genefilter multtest tkWidgets reposTools affy widgetTools limma • 908 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States
Kimpel, Mark William wrote: > I work with CEL files that frequently have names assigned randomly in > respect to phenotype. I create my pdata files by modifying spreadsheets > with file and phenotype information already in appropriate columns. I > had been assuming that it did not matter what order the filenames were > in in the first column of the pdata file, that after being read in the > CEL files would be matched to the appropriate row in pdata and would > thus have the correct phenotype assigned. > > Some recent work has indicated to me that this is probably NOT the case, > instead, it appears that the files are read in by filename alphanumeric > order and the phenotype and sample is assigned by row order of the pdata > file. This, of course, will often result in incorrect sample names and > phenotypes being assigned to files. > > I have searched the documentation and help files for an answer to this > question to no avail. > > How is this supposed to work? Two ways; you can either input your data using the widget-based interface, or the way you are doing things now except with the rows of the phenoData object in alphanumeric order. The function read.affybatch simply takes the phenoData object as is, and assumes you have ordered things correctly. The relevant line in read.affybatch() is this: samplenames <- rownames(pdata) A call to list.celfiles() can be used to set the order of your phenoData object to ensure things line up correctly. Best, Jim > > SessionInfo() > > Version 2.3.0 Under development (unstable) (2006-01-01 r36947) > i386-pc-mingw32 > > attached base packages: > [1] "tcltk" "splines" "tools" "methods" "stats" > "graphics" > [7] "grDevices" "utils" "datasets" "base" > > other attached packages: > tkWidgets DynDoc reposTools widgetTools rat2302cdf > "1.9.0" "1.9.0" "1.9.1" "1.7.0" "1.5.1" > affycoretools GOstats multtest genefilter survival > "1.3.1" "1.5.4" "1.8.0" "1.9.2" "2.20" > xtable RBGL annotate GO graph > "1.3-0" "1.7.6" "1.8.0" "1.6.5" "1.9.4" > Ruuid cluster limma affy Biobase > "1.9.0" "1.10.2" "2.4.4" "1.9.6" "1.9.2" > RWinEdt > "1.7-3" > > > Mark W. Kimpel > I.U. School of Medicine > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD COMMENT

Login before adding your answer.

Traffic: 932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6