Iteration through a list failing
1
0
Entering edit mode
maxglycine • 0
@maxglycine-10855
Last seen 7.9 years ago

All:

This is probably a newbe rookie question but I am trying to iterate through a list (matrix?) and it fails with:

"Error in gsedf[i, 1] : object of type 'S4' is not subsettable"

This is strange because I have another script which works perfectly with almost the same code.  I am using GEOquery to extract the list of GSM files that makeup a GSE set.  The gsedf is a list of GSE files to download and process.  This works in my other script but fails with the error message above in the script below.  I don't know if this is really due to iterating through the list or to processing each GSM file from the GSE set to extract the data table. The script does process the first GSE accession in the list, just it fails apparently when it starts on the second GSE accession in the list.  I have re-ordered the list and it processes the first GSE in the list but fails with the same error before going on to the second.  Thanks for any help. 

Code follows:

#!/usr/bin/Rscript
library("GEOquery")
library("Biobase")
#options(warn = 1)
# get the filename of GSE numbers as argument
args = commandArgs(trailingOnly = TRUE)
# assign filename to f
f = args[1]
f
# read in the table of filenames
gselist <- list(read.table(f, header=FALSE, sep="", quote=""));
gselist
# make a dataframe from gselist
gsedf <- data.frame(gselist)
gsedf
y=nrow(gsedf)
print(paste("GSElist Length", y))

#this works perfectly to see if it was an iteration problem
for(i in 1:y){
  gsename <- gsedf[i,]
  print(paste("gsename=",gsename,sep=""))
}

# This is the part that fails
# iterate through each GSE
for (i in 1:y){
        gsename <- gsedf[i,1]
        print(paste("GSEno=",i,"Name=",gsename))
        # get the gds object
        gsedf<-getGEO(gsename, GSEMatrix=FALSE, destdir=".")
        gsmnamesdf<-data.frame(names(GSMList(gsedf))) #get list of sample names
                                                      #and make a data frame
        gsefile=paste(gsename, "-samples.txt", sep="")#make output file name for sample
                                                      #list
        print("Writing Samplename file")
        write.table(gsmnamesdf, file=gsefile, sep="\t") #write list of samples
         z=nrow(gsmnamesdf) #get number of sample names
         print(paste("Number of Samples=",z))
        for (j in 1:z){ #iterate through each sample name
          print(paste("SampleNum=",j,sep=""))
          gsm = gsmnamesdf[j,] #sample name of j sample
          print(paste("Sample Name=",gsm))
          gsmdf <- getGEO(gsm, destdir=".") #get sample
          outfile=paste(gsm,"table.txt",sep="-")#make sample output file name
          tabledf <- Table(gsmdf)#extract data table from the GSM accession
          print(paste("Printing",outfile))
          write.table(tabledf, file=outfile, sep="\t")#write the table to file
          # clean up
          softfile=paste(gsm,".soft", sep="")# the softfile name
          gzfile=paste(gsename, ".soft.gz", sep=" ")# the zippped softfile name
          print(paste("Deleting",softfile, sep=" "))
          file.remove(softfile) #delete the soft file
          file.remove(gzfile) #delete the zipped file
        }

}
print("Ended processing")
q()

 

geoquery • 760 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

The obvious answer is that you are ending up with an S4 object that doesn't have a '[' function specified, where in fact you are expecting something else. This is really a programming problem, not a Bioconductor support site issue. In other words, it's your script that is failing, and it's not because GEOquery has a bug - your script has the bug.

But there are some weird things here. For instance

gselist <- list(read.table(f, header=FALSE, sep="", quote=""));
gselist
# make a dataframe from gselist
gsedf <- data.frame(gselist)

You read in a file (into a data.frame), convert it immediately to a list, and then convert back into a data.frame. All this coercion may well be doing something you aren't expecting, and appears unnecessary.

Then you 'test' your loop doing

#this works perfectly to see if it was an iteration problem
for(i in 1:y){
  gsename <- gsedf[i,]
  print(paste("gsename=",gsename,sep=""))
}

Which is close, but not exactly the same as

for (i in 1:y){
        gsename <- gsedf[i,1]

and the error you get is when you do gsedf[i,1], but not when you do gsedf[i,]. So the former doesn't test the latter. But it seems like the easy play is to just make the gsedf and see what you have in each row, and if those things are unsubsettable S4 objects.

Anyway, given that this is your code, you shouldn't expect any support here. If you are going to write scripts, you have to learn to debug them yourself.

ADD COMMENT

Login before adding your answer.

Traffic: 688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6