I am trying to sub-select bunch of CEL files whilst creating a manifest file(this is also a phenotype data). The way I am going about doing is as follows;
The gds downloads about 536 files but I wish to read only say 119 files and given by "sub-set.tsv"
library(GEOquery)
gds <- getGEO('GSE86952', destdir=".")
library(Biobase)
library(simpleaffy)
tab <- read.delim("sub-set.tsv", check.names = FALSE, as.is = TRUE)
rownames(tab) <- tab$filenames
tab
fns <- list.celfiles()
fns
fns %in% tab[, 1] ##check
rawdata<- ReadAffy(phenoData = tab)
I am unable to just sub-select my required data. Any suggestion is much appreciated. .
An example, say I just want to use the following CEL files and not the entire 536. I tried to save the given below as files.tsv instead of sub-set.tsv and tried to load but it didnt work.
GSM158712.CEL
GSM158716.CEL
GSM158717.CEL
GSM158719.CEL
GSM158721.CEL
GSM158738.CEL
GSM158741.CEL
GSM158742.CEL
GSM158743.CEL
GSM158745.CEL
GSM158747.CEL
GSM158749.CEL
GSM158752.CEL
GSM158753.CEL
GSM158755.CEL
The error is:
> rawdata<- ReadAffy(phenoData = tab)
Error in `sampleNames<-`(`*tmp*`, value = c("1", "2", "3", "4", "5", "6", :
number of new names (118) should equal number of rows in AnnotatedDataFrame (536)
In addition: Warning messages:
1: Mismatched phenoData and celfile names!
Please note that the row.names of your phenoData object should be identical to what you get from list.celfiles()!
Otherwise you are responsible for ensuring that the ordering of your phenoData object conforms to the ordering of the celfiles as they are read into the AffyBatch!
If not, errors may result from using the phenoData for subsetting or creating linear models, etc.
2: In read.affybatch(filenames = l$filenames, phenoData = l$phenoData, :
Incompatible phenoData object. Created a new one.