Question: Iteration through a list failing
2.4 years ago by
maxglycine0 wrote:

All:

This is probably a newbe rookie question but I am trying to iterate through a list (matrix?) and it fails with:

"Error in gsedf[i, 1] : object of type 'S4' is not subsettable"

This is strange because I have another script which works perfectly with almost the same code.  I am using GEOquery to extract the list of GSM files that makeup a GSE set.  The gsedf is a list of GSE files to download and process.  This works in my other script but fails with the error message above in the script below.  I don't know if this is really due to iterating through the list or to processing each GSM file from the GSE set to extract the data table. The script does process the first GSE accession in the list, just it fails apparently when it starts on the second GSE accession in the list.  I have re-ordered the list and it processes the first GSE in the list but fails with the same error before going on to the second.  Thanks for any help.

Code follows:

#!/usr/bin/Rscript
library("GEOquery")
library("Biobase")
#options(warn = 1)
# get the filename of GSE numbers as argument
args = commandArgs(trailingOnly = TRUE)
# assign filename to f
f = args[1]
f
# read in the table of filenames
gselist
# make a dataframe from gselist
gsedf <- data.frame(gselist)
gsedf
y=nrow(gsedf)
print(paste("GSElist Length", y))

#this works perfectly to see if it was an iteration problem
for(i in 1:y){
gsename <- gsedf[i,]
print(paste("gsename=",gsename,sep=""))
}

# This is the part that fails
# iterate through each GSE
for (i in 1:y){
gsename <- gsedf[i,1]
print(paste("GSEno=",i,"Name=",gsename))
# get the gds object
gsedf<-getGEO(gsename, GSEMatrix=FALSE, destdir=".")
gsmnamesdf<-data.frame(names(GSMList(gsedf))) #get list of sample names
#and make a data frame
gsefile=paste(gsename, "-samples.txt", sep="")#make output file name for sample
#list
print("Writing Samplename file")
write.table(gsmnamesdf, file=gsefile, sep="\t") #write list of samples
z=nrow(gsmnamesdf) #get number of sample names
print(paste("Number of Samples=",z))
for (j in 1:z){ #iterate through each sample name
print(paste("SampleNum=",j,sep=""))
gsm = gsmnamesdf[j,] #sample name of j sample
print(paste("Sample Name=",gsm))
gsmdf <- getGEO(gsm, destdir=".") #get sample
outfile=paste(gsm,"table.txt",sep="-")#make sample output file name
tabledf <- Table(gsmdf)#extract data table from the GSM accession
print(paste("Printing",outfile))
write.table(tabledf, file=outfile, sep="\t")#write the table to file
# clean up
softfile=paste(gsm,".soft", sep="")# the softfile name
gzfile=paste(gsename, ".soft.gz", sep=" ")# the zippped softfile name
print(paste("Deleting",softfile, sep=" "))
file.remove(softfile) #delete the soft file
file.remove(gzfile) #delete the zipped file
}

}
print("Ended processing")
q()

modified 2.4 years ago by James W. MacDonald48k • written 2.4 years ago by maxglycine0
2.4 years ago by
James W. MacDonald48k wrote:

The obvious answer is that you are ending up with an S4 object that doesn't have a '[' function specified, where in fact you are expecting something else. This is really a programming problem, not a Bioconductor support site issue. In other words, it's your script that is failing, and it's not because GEOquery has a bug - your script has the bug.

But there are some weird things here. For instance

gselist <- list(read.table(f, header=FALSE, sep="", quote=""));
gselist
# make a dataframe from gselist
gsedf <- data.frame(gselist)

You read in a file (into a data.frame), convert it immediately to a list, and then convert back into a data.frame. All this coercion may well be doing something you aren't expecting, and appears unnecessary.

Then you 'test' your loop doing

#this works perfectly to see if it was an iteration problem
for(i in 1:y){
gsename <- gsedf[i,]
print(paste("gsename=",gsename,sep=""))
}

Which is close, but not exactly the same as

for (i in 1:y){
gsename <- gsedf[i,1]

and the error you get is when you do gsedf[i,1], but not when you do gsedf[i,]. So the former doesn't test the latter. But it seems like the easy play is to just make the gsedf and see what you have in each row, and if those things are unsubsettable S4 objects.

Anyway, given that this is your code, you shouldn't expect any support here. If you are going to write scripts, you have to learn to debug them yourself.