problem reading genepix files using both marray and limma functions
1
0
Entering edit mode
Bela Tiwari ▴ 60
@bela-tiwari-339
Last seen 8.6 years ago
Hello, Last week I was sent GenePix data files from an associate. As far as I'm aware, these files have not been edited in any way before being sent to me. My aim was to load them up and run some marray and/or limma functions on the data. First I tried to load the files (16 of them) using read.GenePix(), but this failed with an error: Error in "colnames<-"( *tmp*, value = fnames) : length of dimnames [2] not equal to array extent Then I tried loading a file individually using read.GenePix() and that worked fine, however, subsets of files did not. I then read through some of the relevant Bioconductor mailing list posts that I could find, and decided to try the read.maimages function as an alternative. This I did, only to get errors such as: line 35162 did not have 43 elements So, I tried loading the files individually, using read.maimages() to see if I could track down the "problem" files, and then look at them to see if there was an issue with certain lines within those files. I did this, and found that 5 of my 16 files would not load using read.maimages and gave errors like the one directly above. One file gave a different error: "number of items read is not a multiple of the number of columns" giving me a total of 6 out of 16 files that won't load using read.maimages. Tackling the latter error first - I looked at the file, and saw an incomplete line at the bottom of the file. I got rid of that, and tried to load the file using read.GenePix(). I still received a warning message about the fact that the number of items read is not a multiple of the number of columns. I cannot spot the problem in the edited version of the file. The edited file does, however, now read in without error using read.maimages(). I then tried loading the files that "failed" with the first error message above individually with read.GenePix() and this works. I did look at some of the files to try and see what the problem was (ie. whether there was anything obviously strange at the lines indicated as problems by the read.maimages error message), but I can't see anything. I then took the "successful" subset of my files ( those I could read in as individual files using read.maimages), and tried to read those in as a group. This didn't work either, but the error I got was: Error in "[.data.frame"(obj, , columns$Rf) : undefined columns selected So, I specified the columns explicitly in the read.maimages command, but I still got the same error. Thankfully, a recent posting to the mailing list (http://files.protsuggest.org/biocond/html/3512.html) mentioned issues related to this, and Dave Nelson gave a solution that could be implemented. I did this, and my "successful" files then read in just fine using this hacked version of read.maimages(). I also tried using the read.Genepix() function to read in just the group of "successful" files and that gives the error: Error in "colnames<-"( *tmp*, value = fnames) : length of dimnames [2] not equal to array extent So, overall, my questions are: Is there anyone out there who would be willing to scan over one of my "successful" files and one of my "failed" files and see if they can spot the problem? The errors suggest that the problem should be easy to spot...but I can't see it. Even with all the gymnastics related above, I still have a situation where I have only managed to load about half of the files I have. Is there anyone else who has had these experiences of groups of GenePix files being so seemingly inconsistent as far as being able to read them using Bioconductor functions? And if so, do you have any advice on how too make life as easy as possible? Does anyone have any other comments about the internal workings/assumptions of functions such as read.maimages in comparison to, say, functions like read.GenePix, and which may be more forgiving, or have known issues, etc? Sorry this is such a long mail! best wishes, Bela Tiwari ************************* Dr. Bela Tiwari Lead Bioinformatician CEH Oxford Mansfield Road Oxford, OX1 3SR 01865 281975 limma marray limma marray • 972 views ADD COMMENT 0 Entering edit mode @gordon-smyth Last seen 41 minutes ago WEHI, Melbourne, Australia The first things to try are 1. Upgrade to the latest version of limma. 2. Check with your associate that the genepix gpr files really have not been edited. Emphasise that you really need files as they are straight out of GenePix. You should also tell us what versions of R and the packages you are using, type: version packageDescription("limma") and show us the output. If you take both steps above and still have a problem, then you can send me a couple of your gpr files. The gpr files should be in a zip or similar file to prevent any further conversions by the mailer. Gordon ADD COMMENT 0 Entering edit mode Dear Group, I am analyzing Affy dataset from Harvard Brain CEL database. I am working on the data from an experiment where 50 samples were analyzed. I exported the expressions to a matrix. >gliexp <- exprs(justRMA()) > dim(gliexp) [1] 12625 49 I sorted out samples on the pathological state (4 categories). Now I want to do SAM and t-test on any of the 2 samples. I made these 4 categories into 0,1,2 and 3. This was made in a class file : Brain.cl I wrote a function where values from 0 and 1 will give me the means from 0 and 1. My function: >BrainFc <- function(exp,cl){ +X0 = X[,cl==0] +X1 = X[,cl==1] +return(apply(X0,1,mean) - apply(X1,1,mean))} >myBrainFcs <- BrainFc(gliexp,Brain.cl) Error in X[, CL == 0] : incorrect number of dimensions I do not understand why am I getting this error. I checked to see dim of my expression matrix: > dim(gliexp) [1] 12625 49 I have 50 samples, here I see 49. Is this is the source of error? If so what should I do? I also did the following by reading previous messages from BioC mailing lists: > gliexp_genes <- gliexp[1:12625,] > myBrainFcs <- BrainFc(gliexp_genes,Brain.cl) Error in BrainFc(gliexp_genes, Brain.cl) : (subscript) logical subscript too long After this I do not have any option to write to BioC for some help. Please help me where I am doing wrong. Thank you PS ADD REPLY 0 Entering edit mode First thing first. Are you sure justRMA is returning expression for 49 arrays when you have 50 files in the working directory ? getwd() library(affy) length(list.celfiles()) If you accidentally deleted one CEL file, this could be the reason later calls also fail. On Tue, 2004-08-17 at 17:44, S Peri wrote: > Dear Group, > > I am analyzing Affy dataset from Harvard Brain CEL > database. > > I am working on the data from an experiment where 50 > samples were analyzed. I exported the expressions to > a matrix. > > >gliexp <- exprs(justRMA()) > > dim(gliexp) > [1] 12625 49 > > > I sorted out samples on the pathological state (4 > categories). Now I want to do SAM and t-test on any > of the 2 samples. > > I made these 4 categories into 0,1,2 and 3. This was > made in a class file : Brain.cl > > I wrote a function where values from 0 and 1 will give > me the means from 0 and 1. > > My function: > > >BrainFc <- function(exp,cl){ > +X0 = X[,cl==0] > +X1 = X[,cl==1] > +return(apply(X0,1,mean) - apply(X1,1,mean))} > > >myBrainFcs <- BrainFc(gliexp,Brain.cl) > Error in X[, CL == 0] : incorrect number of dimensions > > > I do not understand why am I getting this error. > > I checked to see dim of my expression matrix: > > dim(gliexp) > [1] 12625 49 > > I have 50 samples, here I see 49. > > Is this is the source of error? If so what should I > do? > > > I also did the following by reading previous messages > from BioC mailing lists: > > > gliexp_genes <- gliexp[1:12625,] > > myBrainFcs <- BrainFc(gliexp_genes,Brain.cl) > > Error in BrainFc(gliexp_genes, Brain.cl) : (subscript) > logical subscript too long > > > After this I do not have any option to write to BioC > for some help. > > Please help me where I am doing wrong. > > Thank you > > PS > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > ADD REPLY 0 Entering edit mode Hello Dr. Ramasamy, Thank you for your mail. I loaded only 49 files instead of 50. I am writing the a function for T-test and the following error creeps in always. Could you pls. let me know if the problem is in defining function or something else. Thank you. PS > T-test <- function(X,CL){ + ttest <- function(Xrow,CL){ + return(t.test(Xrow[CL==0],Xrow[CL==1])$p.value) + } + return(apply(X,1,ttest,CL=CL)) + } Error: couldn't find function "-<-" --- Adaikalavan Ramasamy <ramasamy@cancer.org.uk> wrote: > First thing first. Are you sure justRMA is returning > expression for 49 > arrays when you have 50 files in the working > directory ? > > getwd() > library(affy) > length(list.celfiles()) > > If you accidentally deleted one CEL file, this could > be the reason later > calls also fail. > > > On Tue, 2004-08-17 at 17:44, S Peri wrote: > > Dear Group, > > > > I am analyzing Affy dataset from Harvard Brain CEL > > database. > > > > I am working on the data from an experiment where > 50 > > samples were analyzed. I exported the expressions > to > > a matrix. > > > > >gliexp <- exprs(justRMA()) > > > dim(gliexp) > > [1] 12625 49 > > > > > > I sorted out samples on the pathological state (4 > > categories). Now I want to do SAM and t-test on > any > > of the 2 samples. > > > > I made these 4 categories into 0,1,2 and 3. This > was > > made in a class file : Brain.cl > > > > I wrote a function where values from 0 and 1 will > give > > me the means from 0 and 1. > > > > My function: > > > > >BrainFc <- function(exp,cl){ > > +X0 = X[,cl==0] > > +X1 = X[,cl==1] > > +return(apply(X0,1,mean) - apply(X1,1,mean))} > > > > >myBrainFcs <- BrainFc(gliexp,Brain.cl) > > Error in X[, CL == 0] : incorrect number of > dimensions > > > > > > I do not understand why am I getting this error. > > > > I checked to see dim of my expression matrix: > > > dim(gliexp) > > [1] 12625 49 > > > > I have 50 samples, here I see 49. > > > > Is this is the source of error? If so what should > I > > do? > > > > > > I also did the following by reading previous > messages > > from BioC mailing lists: > > > > > gliexp_genes <- gliexp[1:12625,] > > > myBrainFcs <- BrainFc(gliexp_genes,Brain.cl) > > > > Error in BrainFc(gliexp_genes, Brain.cl) : > (subscript) > > logical subscript too long > > > > > > After this I do not have any option to write to > BioC > > for some help. > > > > Please help me where I am doing wrong. > > > > Thank you > > > > PS > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >
0
Entering edit mode
The problem lies in the name of the function "T-test". In R, you cannot use a hypen in function name as it also represents the minus operator. Try using underscore or dot. Here is a function that I use which generalises yours. It allows missing values in class variable, returns NA when one of the groups contain one or no valid observation and some basic error checking. row.t.test <- function(mat, cl){ stopifnot(length(cl)==ncol(mat)) if(nlevels(as.factor(cl)) !=2 ) stop("Only two levels in cl allowed") g1 <- which( cl == levels(as.factor(cl))[1] ) g2 <- which( cl == levels(as.factor(cl))[2] ) length.na <- function(x) length( x[ !is.na(x) ] ) results <- apply(mat, 1, function(x){ if( length.na(x[g1]) < 2 || length.na(x[g2]) < 2 ){ return(NA) } else { return( t.test( x[g1], x[g2] )$p.value ) } }) return(results) } # USAGE mat <- matrix( rnorm(120), nc=12 ) mat[ lower.tri(mat) ] <- NA cl <- rep(1:2, each=6) cl[3] <- NA row.t.test(mat, cl) On Tue, 2004-08-17 at 23:09, S Peri wrote: > Hello Dr. Ramasamy, > Thank you for your mail. I loaded only 49 files > instead of 50. > I am writing the a function for T-test and the > following error creeps in always. > > Could you pls. let me know if the problem is in > defining function or something else. > Thank you. > PS > > > > T-test <- function(X,CL){ > + ttest <- function(Xrow,CL){ > + return(t.test(Xrow[CL==0],Xrow[CL==1])$p.value) > + } > + return(apply(X,1,ttest,CL=CL)) > + } > Error: couldn't find function "-<-" > > > > > > > > > > --- Adaikalavan Ramasamy <ramasamy@cancer.org.uk> > wrote: > > > First thing first. Are you sure justRMA is returning > > expression for 49 > > arrays when you have 50 files in the working > > directory ? > > > > getwd() > > library(affy) > > length(list.celfiles()) > > > > If you accidentally deleted one CEL file, this could > > be the reason later > > calls also fail. > > > > > > On Tue, 2004-08-17 at 17:44, S Peri wrote: > > > Dear Group, > > > > > > I am analyzing Affy dataset from Harvard Brain CEL > > > database. > > > > > > I am working on the data from an experiment where > > 50 > > > samples were analyzed. I exported the expressions > > to > > > a matrix. > > > > > > >gliexp <- exprs(justRMA()) > > > > dim(gliexp) > > > [1] 12625 49 > > > > > > > > > I sorted out samples on the pathological state (4 > > > categories). Now I want to do SAM and t-test on > > any > > > of the 2 samples. > > > > > > I made these 4 categories into 0,1,2 and 3. This > > was > > > made in a class file : Brain.cl > > > > > > I wrote a function where values from 0 and 1 will > > give > > > me the means from 0 and 1. > > > > > > My function: > > > > > > >BrainFc <- function(exp,cl){ > > > +X0 = X[,cl==0] > > > +X1 = X[,cl==1] > > > +return(apply(X0,1,mean) - apply(X1,1,mean))} > > > > > > >myBrainFcs <- BrainFc(gliexp,Brain.cl) > > > Error in X[, CL == 0] : incorrect number of > > dimensions > > > > > > > > > I do not understand why am I getting this error. > > > > > > I checked to see dim of my expression matrix: > > > > dim(gliexp) > > > [1] 12625 49 > > > > > > I have 50 samples, here I see 49. > > > > > > Is this is the source of error? If so what should > > I > > > do? > > > > > > > > > I also did the following by reading previous > > messages > > > from BioC mailing lists: > > > > > > > gliexp_genes <- gliexp[1:12625,] > > > > myBrainFcs <- BrainFc(gliexp_genes,Brain.cl) > > > > > > Error in BrainFc(gliexp_genes, Brain.cl) : > > (subscript) > > > logical subscript too long > > > > > > > > > After this I do not have any option to write to > > BioC > > > for some help. > > > > > > Please help me where I am doing wrong. > > > > > > Thank you > > > > > > PS > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com >