Error in lm.fit(design, t(M)) from GEO dataset
2
0
Entering edit mode
@jeremiah-h-savage-4102
Last seen 9.6 years ago
Hello, I am using limma 3.4.0, and received an error with a GEO series ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12113 ) that doesn't occur with other series data: Error in lm.fit(design, t(M)) : NA/NaN/Inf in foreign function call (arg 4) If I do a summary on the eset, I get some -Inf values. Does this mean there is some invalid data in the series that lmFit doesn't handle? > summary(exprs(eset12113)) GSM305488 GSM305489 GSM305490 GSM305491 Min. :-2.322 Min. :-3.322 Min. :-3.322 Min. : -Inf 1st Qu.: 3.263 1st Qu.: 2.982 1st Qu.: 4.585 1st Qu.: 3.868 Median : 5.202 Median : 4.916 Median : 6.200 Median : 5.638 Mean : 5.140 Mean : 4.934 Mean : 6.054 Mean : -Inf 3rd Qu.: 6.785 3rd Qu.: 6.637 3rd Qu.: 7.485 3rd Qu.: 7.323 Max. :15.620 Max. :15.180 Max. :14.749 Max. :14.290 The code I'm using is: > source("sampleEset.R") > fit <- getLMFit() sampleEset.R : -------------------8<----------------------------- getLMFit <- function() { eset12113 <- getEset("12113") design12113 <- model.matrix(~ 0+factor(c(1,1,2,3,3,4,5,5,6,7,7,8,9,9,10))) colnames(design12113) <- c("untreated1","treated1","untreated2","treated2","untreated3","treate d3","untreated4","treated4","untreated5","treated5") contrast.matrix <- makeContrasts(untreated1-treated1,levels=design12113) fit <- lmFit(eset12113,design12113) } getEset <- function(value) { DATADIR = "/home/jeremiah/R/i486-pc-linux-gnu- library/2.11/GEOquery/extdata" data_file = paste("GSE",value,".soft",sep="") data_dir_file= paste(DATADIR,data_file,sep="/") library(GEOquery) library(Biobase) library(limma) if (file.exists(data_dir_file)) { GSE_parsed <- getGEO(filename=data_dir_file) } else { GSE_value <- paste("GSE",value,sep="") GSE_file <- getGEOfile(GSE_value,destdir=DATADIR) GSE_parsed <- getGEO(filename=data_dir_file) } GSM_platform <- lapply(GSMList(GSE_parsed), function(x) { Meta(x)$platform }) GSM_platform <- GSM_platform[[1]] probe_set <- Table(GPLList(GSE_parsed)[[1]])$ID data.matrix <- do.call("cbind", lapply(GSMList(GSE_parsed), function(x) { tab <- Table(x) mymatch <- match(probe_set, tab$ID_REF) return(tab$VALUE[mymatch]) })) data.matrix <- apply(data.matrix, 2, function(x) { as.numeric(as.character(x)) }) dataLog2.matrix <- log2(data.matrix) rownames(dataLog2.matrix) <- probe_set colnames(dataLog2.matrix) <- names(GSMList(GSE_parsed)) p_data <- data.frame(samples = names(GSMList(GSE_parsed))) rownames(p_data) <- names(GSMList(GSE_parsed)) pheno <- as(p_data, "AnnotatedDataFrame") eset <- new("ExpressionSet", exprs = dataLog2.matrix, phenoData = pheno) return(eset) } -------------------8<----------------------------- Thanks, Jeremiah
limma limma • 1.3k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States
On Thu, May 27, 2010 at 9:39 PM, Jeremiah H. Savage < jeremiahsavage@gmail.com> wrote: > Hello, > > I am using limma 3.4.0, and received an error with a GEO series ( > http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12113 ) that > doesn't occur with other series data: > > Error in lm.fit(design, t(M)) : > NA/NaN/Inf in foreign function call (arg 4) > > If I do a summary on the eset, I get some -Inf values. Does this mean > there is some invalid data in the series that lmFit doesn't handle? > > summary(exprs(eset12113)) > GSM305488 GSM305489 GSM305490 GSM305491 > Min. :-2.322 Min. :-3.322 Min. :-3.322 Min. : -Inf > 1st Qu.: 3.263 1st Qu.: 2.982 1st Qu.: 4.585 1st Qu.: 3.868 > Median : 5.202 Median : 4.916 Median : 6.200 Median : 5.638 > Mean : 5.140 Mean : 4.934 Mean : 6.054 Mean : -Inf > 3rd Qu.: 6.785 3rd Qu.: 6.637 3rd Qu.: 7.485 3rd Qu.: 7.323 > Max. :15.620 Max. :15.180 Max. :14.749 Max. :14.290 > > > The code I'm using is: > > source("sampleEset.R") > > fit <- getLMFit() > > > sampleEset.R : > -------------------8<----------------------------- > > getLMFit <- function() > { > eset12113 <- getEset("12113") > > design12113 <- model.matrix(~ > 0+factor(c(1,1,2,3,3,4,5,5,6,7,7,8,9,9,10))) > colnames(design12113) <- > > c("untreated1","treated1","untreated2","treated2","untreated3","trea ted3","untreated4","treated4","untreated5","treated5") > contrast.matrix <- makeContrasts(untreated1-treated1,levels=design12113) > fit <- lmFit(eset12113,design12113) > } > > getEset <- function(value) > { > DATADIR = > "/home/jeremiah/R/i486-pc-linux-gnu-library/2.11/GEOquery/extdata" > data_file = paste("GSE",value,".soft",sep="") > data_dir_file= paste(DATADIR,data_file,sep="/") > > library(GEOquery) > library(Biobase) > library(limma) > > if (file.exists(data_dir_file)) > { > GSE_parsed <- getGEO(filename=data_dir_file) > } > else > { > GSE_value <- paste("GSE",value,sep="") > GSE_file <- getGEOfile(GSE_value,destdir=DATADIR) > GSE_parsed <- getGEO(filename=data_dir_file) > } > GSM_platform <- lapply(GSMList(GSE_parsed), function(x) { > Meta(x)$platform > }) > GSM_platform <- GSM_platform[[1]] > > probe_set <- Table(GPLList(GSE_parsed)[[1]])$ID > > data.matrix <- do.call("cbind", lapply(GSMList(GSE_parsed), function(x) > { > tab <- Table(x) > mymatch <- match(probe_set, tab$ID_REF) > return(tab$VALUE[mymatch]) > })) > > data.matrix <- apply(data.matrix, 2, function(x) { > as.numeric(as.character(x)) > }) > > dataLog2.matrix <- log2(data.matrix) > > rownames(dataLog2.matrix) <- probe_set > colnames(dataLog2.matrix) <- names(GSMList(GSE_parsed)) > p_data <- data.frame(samples = names(GSMList(GSE_parsed))) > rownames(p_data) <- names(GSMList(GSE_parsed)) > pheno <- as(p_data, "AnnotatedDataFrame") > eset <- new("ExpressionSet", exprs = dataLog2.matrix, phenoData = pheno) > return(eset) > } > > Just an aside, Jeremiah, but getGEO("GSE...") will, by default, return a list of ExpressionSets, so you could use that in place of getEset. As a bonus, you'll get the platform inserted into the featureData slot. Sean > -------------------8<----------------------------- > > Thanks, > Jeremiah > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
James F. Reid ▴ 610
@james-f-reid-3148
Last seen 9.6 years ago
Hi Jeremiah, my guess is that you are taking logs of negative values at some stage in your pre-processing. James. On 05/28/2010 03:39 AM, Jeremiah H. Savage wrote: > Hello, > > I am using limma 3.4.0, and received an error with a GEO series ( > http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12113 ) that > doesn't occur with other series data: > > Error in lm.fit(design, t(M)) : > NA/NaN/Inf in foreign function call (arg 4) > > If I do a summary on the eset, I get some -Inf values. Does this mean > there is some invalid data in the series that lmFit doesn't handle? >> summary(exprs(eset12113)) > GSM305488 GSM305489 GSM305490 GSM305491 > Min. :-2.322 Min. :-3.322 Min. :-3.322 Min. : -Inf > 1st Qu.: 3.263 1st Qu.: 2.982 1st Qu.: 4.585 1st Qu.: 3.868 > Median : 5.202 Median : 4.916 Median : 6.200 Median : 5.638 > Mean : 5.140 Mean : 4.934 Mean : 6.054 Mean : -Inf > 3rd Qu.: 6.785 3rd Qu.: 6.637 3rd Qu.: 7.485 3rd Qu.: 7.323 > Max. :15.620 Max. :15.180 Max. :14.749 Max. :14.290 > > > The code I'm using is: >> source("sampleEset.R") >> fit<- getLMFit() > > > sampleEset.R : > -------------------8<----------------------------- > > getLMFit<- function() > { > eset12113<- getEset("12113") > > design12113<- model.matrix(~ 0+factor(c(1,1,2,3,3,4,5,5,6,7,7,8,9,9,10))) > colnames(design12113)<- > c("untreated1","treated1","untreated2","treated2","untreated3","trea ted3","untreated4","treated4","untreated5","treated5") > contrast.matrix<- makeContrasts(untreated1-treated1,levels=design12113) > fit<- lmFit(eset12113,design12113) > } > > getEset<- function(value) > { > DATADIR = "/home/jeremiah/R/i486-pc-linux-gnu- library/2.11/GEOquery/extdata" > data_file = paste("GSE",value,".soft",sep="") > data_dir_file= paste(DATADIR,data_file,sep="/") > > library(GEOquery) > library(Biobase) > library(limma) > > if (file.exists(data_dir_file)) > { > GSE_parsed<- getGEO(filename=data_dir_file) > } > else > { > GSE_value<- paste("GSE",value,sep="") > GSE_file<- getGEOfile(GSE_value,destdir=DATADIR) > GSE_parsed<- getGEO(filename=data_dir_file) > } > GSM_platform<- lapply(GSMList(GSE_parsed), function(x) { > Meta(x)$platform > }) > GSM_platform<- GSM_platform[[1]] > > probe_set<- Table(GPLList(GSE_parsed)[[1]])$ID > > data.matrix<- do.call("cbind", lapply(GSMList(GSE_parsed), function(x) { > tab<- Table(x) > mymatch<- match(probe_set, tab$ID_REF) > return(tab$VALUE[mymatch]) > })) > > data.matrix<- apply(data.matrix, 2, function(x) { > as.numeric(as.character(x)) > }) > > dataLog2.matrix<- log2(data.matrix) > > rownames(dataLog2.matrix)<- probe_set > colnames(dataLog2.matrix)<- names(GSMList(GSE_parsed)) > p_data<- data.frame(samples = names(GSMList(GSE_parsed))) > rownames(p_data)<- names(GSMList(GSE_parsed)) > pheno<- as(p_data, "AnnotatedDataFrame") > eset<- new("ExpressionSet", exprs = dataLog2.matrix, phenoData = pheno) > return(eset) > } > > -------------------8<----------------------------- > > Thanks, > Jeremiah > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6