Combining expressionSets from GEO
4
0
Entering edit mode
Francois Pepin ★ 1.3k
@francois-pepin-1012
Last seen 10.4 years ago
Hi everyone, I'm getting an error message when trying to combine two parts of a GSE object: >tmp<-getGEO('GSE3526',GSEMatrix=T) > tmp2<-combine(tmp[[1]],tmp[[2]]) Error in alleq(levels(x[[nm]]), levels(y[[nm]])) && alleq(x [sharedRows, : invalid 'x' type in 'x && y' Checking to make sure that I should be able to combine them (from the eSet documentation): #eSets must have identical numbers of 'featureNames' > all(featureNames(tmp[[2]])==featureNames(tmp[[2]])) [1] TRUE #must have distinct 'sampleNames' > any(sampleNames(tmp[[1]])%in%sampleNames(tmp[[2]])) [1] FALSE #and must have identical 'annotation'. > annotation(tmp[[2]])==annotation(tmp[[2]]) [1] TRUE > sessionInfo() R version 2.6.0 (2007-10-03) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.0 loaded via a namespace (and not attached): [1] rcompgen_0.1-15 Does anyone know why that is happening and if there would be any way around it? Francois
• 1.9k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
Hi Francois -- this might be related to a bug in Biobase that has been fixed. Can you try to update your Biobase, either biocLite('Biobase') or following the directions at http://bioconductor.org/download ? If not, can you provide the output of traceback() after the error occurs? Thanks, Martin Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: > Hi everyone, > > I'm getting an error message when trying to combine two parts of a GSE > object: > >>tmp<-getGEO('GSE3526',GSEMatrix=T) >> tmp2<-combine(tmp[[1]],tmp[[2]]) > Error in alleq(levels(x[[nm]]), levels(y[[nm]])) && alleq(x > [sharedRows, : > invalid 'x' type in 'x && y' > > Checking to make sure that I should be able to combine them (from the > eSet documentation): > > #eSets must have identical numbers of 'featureNames' >> all(featureNames(tmp[[2]])==featureNames(tmp[[2]])) > [1] TRUE > > #must have distinct 'sampleNames' >> any(sampleNames(tmp[[1]])%in%sampleNames(tmp[[2]])) > [1] FALSE > > #and must have identical 'annotation'. >> annotation(tmp[[2]])==annotation(tmp[[2]]) > [1] TRUE > >> sessionInfo() > R version 2.6.0 (2007-10-03) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_U S.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF -8;LC_IDENTIFICATION=C > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.0 > > loaded via a namespace (and not attached): > [1] rcompgen_0.1-15 > > Does anyone know why that is happening and if there would be any way > around it? > > Francois > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Hi Martin, I think it is related, as I now have a different error message along with a series of warnings. 255 and 98 refer to the number of samples in each ExpressionSet. 66 and 21 refer to the number of unique elements in source_name_ch1 in the phenodata. > tmp2<-combine(tmp[[1]],tmp[[2]]) Error in .local(x, y, ...) : data.frames contain conflicting data: non-conforming colname(s): title, geo_accession, source_name_ch1, description, supplementary_file In addition: Warning messages: 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : Lengths (255, 98) differ (string compare on first 98)98 string mismatches 2: In switch(class(x[[nm]])[[1]], factor = { : data frame column 'title' levels not all.equal 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : Lengths (255, 98) differ (string compare on first 98)98 string mismatches 4: In switch(class(x[[nm]])[[1]], factor = { : data frame column 'geo_accession' levels not all.equal 5: In alleq(levels(x[[nm]]), levels(y[[nm]])) : Lengths (66, 21) differ (string compare on first 21)21 string mismatches 6: In switch(class(x[[nm]])[[1]], factor = { : data frame column 'source_name_ch1' levels not all.equal 7: In alleq(levels(x[[nm]]), levels(y[[nm]])) : Lengths (255, 98) differ (string compare on first 98)98 string mismatches 8: In switch(class(x[[nm]])[[1]], factor = { : data frame column 'description' levels not all.equal 9: In alleq(levels(x[[nm]]), levels(y[[nm]])) : Lengths (255, 98) differ (string compare on first 98)98 string mismatches 10: In switch(class(x[[nm]])[[1]], factor = { : data frame column 'supplementary_file' levels not all.equal > traceback() 9: stop("data.frames contain conflicting data:", "\n\tnon-conforming colname(s): ", paste(sharedCols[!ok], collapse = ", ")) 8: .local(x, y, ...) 7: combine(pDataX, pDataY) 6: combine(pDataX, pDataY) 5: .local(x, y, ...) 4: combine(phenoData(x), phenoData(y)) 3: combine(phenoData(x), phenoData(y)) 2: combine(tmp[[1]], tmp[[2]]) 1: combine(tmp[[1]], tmp[[2]]) > sessionInfo() R version 2.6.0 (2007-10-03) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.2 loaded via a namespace (and not attached): [1] rcompgen_0.1-15 Francois On Wed, 2008-01-30 at 10:03 -0800, Martin Morgan wrote: > Hi Francois -- this might be related to a bug in Biobase that has been > fixed. Can you try to update your Biobase, either biocLite('Biobase') > or following the directions at http://bioconductor.org/download ? If > not, can you provide the output of traceback() after the error occurs? > > Thanks, > > Martin > > Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: > > > Hi everyone, > > > > I'm getting an error message when trying to combine two parts of a GSE > > object: > > > >>tmp<-getGEO('GSE3526',GSEMatrix=T) > >> tmp2<-combine(tmp[[1]],tmp[[2]]) > > Error in alleq(levels(x[[nm]]), levels(y[[nm]])) && alleq(x > > [sharedRows, : > > invalid 'x' type in 'x && y' > > > > Checking to make sure that I should be able to combine them (from the > > eSet documentation): > > > > #eSets must have identical numbers of 'featureNames' > >> all(featureNames(tmp[[2]])==featureNames(tmp[[2]])) > > [1] TRUE > > > > #must have distinct 'sampleNames' > >> any(sampleNames(tmp[[1]])%in%sampleNames(tmp[[2]])) > > [1] FALSE > > > > #and must have identical 'annotation'. > >> annotation(tmp[[2]])==annotation(tmp[[2]]) > > [1] TRUE > > > >> sessionInfo() > > R version 2.6.0 (2007-10-03) > > x86_64-unknown-linux-gnu > > > > locale: > > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=e n_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en _US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.U TF-8;LC_IDENTIFICATION=C > > > > attached base packages: > > [1] tools stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.0 > > > > loaded via a namespace (and not attached): > > [1] rcompgen_0.1-15 > > > > Does anyone know why that is happening and if there would be any way > > around it? > > > > Francois > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
So part of the bug fix was an attempt to make the error message more informative, and it's not really clear that I've done that! The traceback makes it's clear that the problem is with the pData (and not, for instance varMetadata or featureData) of the two arrays. Some hints are provided by the warnings, by the ?combine help page, 'combine(data.frame, data.frame)' Combines two 'data.frame' objects so that the resulting 'data.frame' contains all rows and columns of the original objects. Rows and columns in the returned value are unique, that is, a row or column represented in both arguments is represented only once in the result. To perform this operation, 'combine' makes sure that data in shared rows and columns is identical in the two data.frames. Data diffrences in shared rows and columns cause an error. 'combine' issues a warning when a column is a 'factor' and the levels of the factor in the two 'data.frame's are different; the returned value may be recoded. and by the results of > example(combine) particularly the last lines which are trying to illustrate your problem: combin> # y is converted to 'factor' with different levels combin> x <- data.frame(x=1:5,y=letters[1:5], row.names=letters[1:5]) combin> y <- data.frame(z=3:7,y=letters[3:7], row.names=letters[3:7]) combin> try(combine(x,y)) Error in combine(x, y) : data.frames contain conflicting data: non-conforming colname(s): y In addition: Warning messages: 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 5 string mismatches 2: In switch(class(x[[nm]])[[1]], factor = { : data frame column 'y' levels not all.equal The data.frame column 'y' is a 'factor' (rather than character vectors) and combine doesn't know how to resolve a column that has 'c' encoded as level 3 of a factor with one that has 'c' encoded as level 1. One solution is to enusre that columns that are really character vectors are stored as such > x <- data.frame(x=1:5,y=I(letters[1:5]), row.names=letters[1:5]) > y <- data.frame(z=3:7,y=I(letters[3:7]), row.names=letters[3:7]) > combine(x,y) x y z a 1 a NA b 2 b NA c 3 c 3 d 4 d 4 e 5 e 5 f NA f 6 g NA g 7 or that factors have the same levels > y1 <- factor(letters[1:5], levels=letters[1:7]) > y2 <- factor(letters[3:7], levels=letters[1:7]) > x <- data.frame(x=1:5, y=y1, row.names=letters[1:5]) > y <- data.frame(z=3:7, y=y2, row.names=letters[3:7]) > combine(x,y) x y z a 1 a NA b 2 b NA c 3 c 3 d 4 d 4 e 5 e 5 f NA f 6 g NA g 7 Martin Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: > Hi Martin, > > I think it is related, as I now have a different error message along > with a series of warnings. 255 and 98 refer to the number of samples in > each ExpressionSet. 66 and 21 refer to the number of unique elements in > source_name_ch1 in the phenodata. > >> tmp2<-combine(tmp[[1]],tmp[[2]]) > Error in .local(x, y, ...) : > data.frames contain conflicting data: > non-conforming colname(s): title, geo_accession, > source_name_ch1, description, supplementary_file > In addition: Warning messages: > 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > Lengths (255, 98) differ (string compare on first 98)98 string > mismatches > 2: In switch(class(x[[nm]])[[1]], factor = { : > data frame column 'title' levels not all.equal > 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > Lengths (255, 98) differ (string compare on first 98)98 string > mismatches > 4: In switch(class(x[[nm]])[[1]], factor = { : > data frame column 'geo_accession' levels not all.equal > 5: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > Lengths (66, 21) differ (string compare on first 21)21 string > mismatches > 6: In switch(class(x[[nm]])[[1]], factor = { : > data frame column 'source_name_ch1' levels not all.equal > 7: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > Lengths (255, 98) differ (string compare on first 98)98 string > mismatches > 8: In switch(class(x[[nm]])[[1]], factor = { : > data frame column 'description' levels not all.equal > 9: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > Lengths (255, 98) differ (string compare on first 98)98 string > mismatches > 10: In switch(class(x[[nm]])[[1]], factor = { : > data frame column 'supplementary_file' levels not all.equal > >> traceback() > 9: stop("data.frames contain conflicting data:", "\n\tnon-conforming > colname(s): ", > paste(sharedCols[!ok], collapse = ", ")) > 8: .local(x, y, ...) > 7: combine(pDataX, pDataY) > 6: combine(pDataX, pDataY) > 5: .local(x, y, ...) > 4: combine(phenoData(x), phenoData(y)) > 3: combine(phenoData(x), phenoData(y)) > 2: combine(tmp[[1]], tmp[[2]]) > 1: combine(tmp[[1]], tmp[[2]]) > >> sessionInfo() > R version 2.6.0 (2007-10-03) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_U S.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF -8;LC_IDENTIFICATION=C > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.2 > > loaded via a namespace (and not attached): > [1] rcompgen_0.1-15 > > Francois > > On Wed, 2008-01-30 at 10:03 -0800, Martin Morgan wrote: >> Hi Francois -- this might be related to a bug in Biobase that has been >> fixed. Can you try to update your Biobase, either biocLite('Biobase') >> or following the directions at http://bioconductor.org/download ? If >> not, can you provide the output of traceback() after the error occurs? >> >> Thanks, >> >> Martin >> >> Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: >> >> > Hi everyone, >> > >> > I'm getting an error message when trying to combine two parts of a GSE >> > object: >> > >> >>tmp<-getGEO('GSE3526',GSEMatrix=T) >> >> tmp2<-combine(tmp[[1]],tmp[[2]]) >> > Error in alleq(levels(x[[nm]]), levels(y[[nm]])) && alleq(x >> > [sharedRows, : >> > invalid 'x' type in 'x && y' >> > >> > Checking to make sure that I should be able to combine them (from the >> > eSet documentation): >> > >> > #eSets must have identical numbers of 'featureNames' >> >> all(featureNames(tmp[[2]])==featureNames(tmp[[2]])) >> > [1] TRUE >> > >> > #must have distinct 'sampleNames' >> >> any(sampleNames(tmp[[1]])%in%sampleNames(tmp[[2]])) >> > [1] FALSE >> > >> > #and must have identical 'annotation'. >> >> annotation(tmp[[2]])==annotation(tmp[[2]]) >> > [1] TRUE >> > >> >> sessionInfo() >> > R version 2.6.0 (2007-10-03) >> > x86_64-unknown-linux-gnu >> > >> > locale: >> > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE= en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=e n_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US. UTF-8;LC_IDENTIFICATION=C >> > >> > attached base packages: >> > [1] tools stats graphics grDevices utils datasets methods >> > [8] base >> > >> > other attached packages: >> > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.0 >> > >> > loaded via a namespace (and not attached): >> > [1] rcompgen_0.1-15 >> > >> > Does anyone know why that is happening and if there would be any way >> > around it? >> > >> > Francois >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Hi Martin, Thanks for the help. I managed to fix the issue by resetting all of the levels on both side (having everything as characters should work too): for (i in 1:length(pData(phenoData(tmp[[1]])))) levels(pData(phenoData(tmp[[1]]))[,i])<-levels(pData(phenoData(tmp [[2]]))[,i]) <- c(unique(as.character(pData(phenoData(tmp [[1]]))[,i])),unique(as.character(pData(phenoData(tmp[[2]]))[,i]))) The next question would be to see where it would best be taken care of. I really don't see why this should not be taken care of behind the scene. The two main options I see would be that getGEO() returns characters of phenoData instead of factors or having combine() know to deal with factors properly for expressionSet. If the former is chosen, I think it would probably be worth adjusting the documentation about combine to mention this issue. As an unrelated note, the ExpressionSet documentation refers to the eSet's. Since eSet is going away at some point, that might be worth changing. Francois On Wed, 2008-01-30 at 10:54 -0800, Martin Morgan wrote: > So part of the bug fix was an attempt to make the error message more > informative, and it's not really clear that I've done that! > > The traceback makes it's clear that the problem is with the pData (and > not, for instance varMetadata or featureData) of the two arrays. > > Some hints are provided by the warnings, by the ?combine help page, > > 'combine(data.frame, data.frame)' Combines two 'data.frame' > objects so that the resulting 'data.frame' contains all rows > and columns of the original objects. Rows and columns in the > returned value are unique, that is, a row or column > represented in both arguments is represented only once in the > result. To perform this operation, 'combine' makes sure that > data in shared rows and columns is identical in the two > data.frames. Data diffrences in shared rows and columns cause > an error. 'combine' issues a warning when a column is a > 'factor' and the levels of the factor in the two > 'data.frame's are different; the returned value may be > recoded. > > and by the results of > > > example(combine) > > particularly the last lines which are trying to illustrate your > problem: > > combin> # y is converted to 'factor' with different levels > combin> x <- data.frame(x=1:5,y=letters[1:5], row.names=letters[1:5]) > > combin> y <- data.frame(z=3:7,y=letters[3:7], row.names=letters[3:7]) > > combin> try(combine(x,y)) > Error in combine(x, y) : data.frames contain conflicting data: > non-conforming colname(s): y > In addition: Warning messages: > 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 5 string mismatches > 2: In switch(class(x[[nm]])[[1]], factor = { : > data frame column 'y' levels not all.equal > > The data.frame column 'y' is a 'factor' (rather than character > vectors) and combine doesn't know how to resolve a column that has 'c' > encoded as level 3 of a factor with one that has 'c' encoded as level > 1. > > One solution is to enusre that columns that are really character > vectors are stored as such > > > x <- data.frame(x=1:5,y=I(letters[1:5]), row.names=letters[1:5]) > > y <- data.frame(z=3:7,y=I(letters[3:7]), row.names=letters[3:7]) > > combine(x,y) > x y z > a 1 a NA > b 2 b NA > c 3 c 3 > d 4 d 4 > e 5 e 5 > f NA f 6 > g NA g 7 > > or that factors have the same levels > > > y1 <- factor(letters[1:5], levels=letters[1:7]) > > y2 <- factor(letters[3:7], levels=letters[1:7]) > > x <- data.frame(x=1:5, y=y1, row.names=letters[1:5]) > > y <- data.frame(z=3:7, y=y2, row.names=letters[3:7]) > > combine(x,y) > x y z > a 1 a NA > b 2 b NA > c 3 c 3 > d 4 d 4 > e 5 e 5 > f NA f 6 > g NA g 7 > > Martin > > Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: > > > Hi Martin, > > > > I think it is related, as I now have a different error message along > > with a series of warnings. 255 and 98 refer to the number of samples in > > each ExpressionSet. 66 and 21 refer to the number of unique elements in > > source_name_ch1 in the phenodata. > > > >> tmp2<-combine(tmp[[1]],tmp[[2]]) > > Error in .local(x, y, ...) : > > data.frames contain conflicting data: > > non-conforming colname(s): title, geo_accession, > > source_name_ch1, description, supplementary_file > > In addition: Warning messages: > > 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > > Lengths (255, 98) differ (string compare on first 98)98 string > > mismatches > > 2: In switch(class(x[[nm]])[[1]], factor = { : > > data frame column 'title' levels not all.equal > > 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > > Lengths (255, 98) differ (string compare on first 98)98 string > > mismatches > > 4: In switch(class(x[[nm]])[[1]], factor = { : > > data frame column 'geo_accession' levels not all.equal > > 5: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > > Lengths (66, 21) differ (string compare on first 21)21 string > > mismatches > > 6: In switch(class(x[[nm]])[[1]], factor = { : > > data frame column 'source_name_ch1' levels not all.equal > > 7: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > > Lengths (255, 98) differ (string compare on first 98)98 string > > mismatches > > 8: In switch(class(x[[nm]])[[1]], factor = { : > > data frame column 'description' levels not all.equal > > 9: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > > Lengths (255, 98) differ (string compare on first 98)98 string > > mismatches > > 10: In switch(class(x[[nm]])[[1]], factor = { : > > data frame column 'supplementary_file' levels not all.equal > > > >> traceback() > > 9: stop("data.frames contain conflicting data:", "\n\tnon- conforming > > colname(s): ", > > paste(sharedCols[!ok], collapse = ", ")) > > 8: .local(x, y, ...) > > 7: combine(pDataX, pDataY) > > 6: combine(pDataX, pDataY) > > 5: .local(x, y, ...) > > 4: combine(phenoData(x), phenoData(y)) > > 3: combine(phenoData(x), phenoData(y)) > > 2: combine(tmp[[1]], tmp[[2]]) > > 1: combine(tmp[[1]], tmp[[2]]) > > > >> sessionInfo() > > R version 2.6.0 (2007-10-03) > > x86_64-unknown-linux-gnu > > > > locale: > > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=e n_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en _US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.U TF-8;LC_IDENTIFICATION=C > > > > attached base packages: > > [1] tools stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.2 > > > > loaded via a namespace (and not attached): > > [1] rcompgen_0.1-15 > > > > Francois > > > > On Wed, 2008-01-30 at 10:03 -0800, Martin Morgan wrote: > >> Hi Francois -- this might be related to a bug in Biobase that has been > >> fixed. Can you try to update your Biobase, either biocLite('Biobase') > >> or following the directions at http://bioconductor.org/download ? If > >> not, can you provide the output of traceback() after the error occurs? > >> > >> Thanks, > >> > >> Martin > >> > >> Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: > >> > >> > Hi everyone, > >> > > >> > I'm getting an error message when trying to combine two parts of a GSE > >> > object: > >> > > >> >>tmp<-getGEO('GSE3526',GSEMatrix=T) > >> >> tmp2<-combine(tmp[[1]],tmp[[2]]) > >> > Error in alleq(levels(x[[nm]]), levels(y[[nm]])) && alleq(x > >> > [sharedRows, : > >> > invalid 'x' type in 'x && y' > >> > > >> > Checking to make sure that I should be able to combine them (from the > >> > eSet documentation): > >> > > >> > #eSets must have identical numbers of 'featureNames' > >> >> all(featureNames(tmp[[2]])==featureNames(tmp[[2]])) > >> > [1] TRUE > >> > > >> > #must have distinct 'sampleNames' > >> >> any(sampleNames(tmp[[1]])%in%sampleNames(tmp[[2]])) > >> > [1] FALSE > >> > > >> > #and must have identical 'annotation'. > >> >> annotation(tmp[[2]])==annotation(tmp[[2]]) > >> > [1] TRUE > >> > > >> >> sessionInfo() > >> > R version 2.6.0 (2007-10-03) > >> > x86_64-unknown-linux-gnu > >> > > >> > locale: > >> > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLAT E=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER =en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_U S.UTF-8;LC_IDENTIFICATION=C > >> > > >> > attached base packages: > >> > [1] tools stats graphics grDevices utils datasets methods > >> > [8] base > >> > > >> > other attached packages: > >> > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.0 > >> > > >> > loaded via a namespace (and not attached): > >> > [1] rcompgen_0.1-15 > >> > > >> > Does anyone know why that is happening and if there would be any way > >> > around it? > >> > > >> > Francois > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor at stat.math.ethz.ch > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >
ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: > Hi Martin, > > Thanks for the help. I managed to fix the issue by resetting all of the > levels on both side (having everything as characters should work too): > > for (i in 1:length(pData(phenoData(tmp[[1]])))) > levels(pData(phenoData(tmp[[1]]))[,i])<-levels(pData(phenoData(tmp > [[2]]))[,i]) <- c(unique(as.character(pData(phenoData(tmp > [[1]]))[,i])),unique(as.character(pData(phenoData(tmp[[2]]))[,i]))) > > The next question would be to see where it would best be taken care of. > I really don't see why this should not be taken care of behind the > scene. > > The two main options I see would be that getGEO() returns characters of > phenoData instead of factors or having combine() know to deal with > factors properly for expressionSet. combine does know how to deal with factors properly -- the levels are different, so the columns (usually) can't be combined. But I appreciate the sentiment, and the issue has come up on the mailing list three times since 2.1, so is a common occurrence. I've tried some more at making the documentation better, and will work on a better set of warnings for the next release of Bioconductor. > If the former is chosen, I think it would probably be worth adjusting > the documentation about combine to mention this issue. As an unrelated > note, the ExpressionSet documentation refers to the eSet's. Since eSet > is going away at some point, that might be worth changing. Actually, 'eSet' is a class that 'ExpressionSet' extends; 'eSet' is not going to away, and many of the data slots and methods on ExpressionSet are inherited from eSet so it's appropriate to reference the eSet documentation for these. The 'exprSet' class is no longer supported. Thanks for your input, Martin > Francois > > On Wed, 2008-01-30 at 10:54 -0800, Martin Morgan wrote: >> So part of the bug fix was an attempt to make the error message more >> informative, and it's not really clear that I've done that! >> >> The traceback makes it's clear that the problem is with the pData (and >> not, for instance varMetadata or featureData) of the two arrays. >> >> Some hints are provided by the warnings, by the ?combine help page, >> >> 'combine(data.frame, data.frame)' Combines two 'data.frame' >> objects so that the resulting 'data.frame' contains all rows >> and columns of the original objects. Rows and columns in the >> returned value are unique, that is, a row or column >> represented in both arguments is represented only once in the >> result. To perform this operation, 'combine' makes sure that >> data in shared rows and columns is identical in the two >> data.frames. Data diffrences in shared rows and columns cause >> an error. 'combine' issues a warning when a column is a >> 'factor' and the levels of the factor in the two >> 'data.frame's are different; the returned value may be >> recoded. >> >> and by the results of >> >> > example(combine) >> >> particularly the last lines which are trying to illustrate your >> problem: >> >> combin> # y is converted to 'factor' with different levels >> combin> x <- data.frame(x=1:5,y=letters[1:5], row.names=letters[1:5]) >> >> combin> y <- data.frame(z=3:7,y=letters[3:7], row.names=letters[3:7]) >> >> combin> try(combine(x,y)) >> Error in combine(x, y) : data.frames contain conflicting data: >> non-conforming colname(s): y >> In addition: Warning messages: >> 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 5 string mismatches >> 2: In switch(class(x[[nm]])[[1]], factor = { : >> data frame column 'y' levels not all.equal >> >> The data.frame column 'y' is a 'factor' (rather than character >> vectors) and combine doesn't know how to resolve a column that has 'c' >> encoded as level 3 of a factor with one that has 'c' encoded as level >> 1. >> >> One solution is to enusre that columns that are really character >> vectors are stored as such >> >> > x <- data.frame(x=1:5,y=I(letters[1:5]), row.names=letters[1:5]) >> > y <- data.frame(z=3:7,y=I(letters[3:7]), row.names=letters[3:7]) >> > combine(x,y) >> x y z >> a 1 a NA >> b 2 b NA >> c 3 c 3 >> d 4 d 4 >> e 5 e 5 >> f NA f 6 >> g NA g 7 >> >> or that factors have the same levels >> >> > y1 <- factor(letters[1:5], levels=letters[1:7]) >> > y2 <- factor(letters[3:7], levels=letters[1:7]) >> > x <- data.frame(x=1:5, y=y1, row.names=letters[1:5]) >> > y <- data.frame(z=3:7, y=y2, row.names=letters[3:7]) >> > combine(x,y) >> x y z >> a 1 a NA >> b 2 b NA >> c 3 c 3 >> d 4 d 4 >> e 5 e 5 >> f NA f 6 >> g NA g 7 >> >> Martin >> >> Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: >> >> > Hi Martin, >> > >> > I think it is related, as I now have a different error message along >> > with a series of warnings. 255 and 98 refer to the number of samples in >> > each ExpressionSet. 66 and 21 refer to the number of unique elements in >> > source_name_ch1 in the phenodata. >> > >> >> tmp2<-combine(tmp[[1]],tmp[[2]]) >> > Error in .local(x, y, ...) : >> > data.frames contain conflicting data: >> > non-conforming colname(s): title, geo_accession, >> > source_name_ch1, description, supplementary_file >> > In addition: Warning messages: >> > 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : >> > Lengths (255, 98) differ (string compare on first 98)98 string >> > mismatches >> > 2: In switch(class(x[[nm]])[[1]], factor = { : >> > data frame column 'title' levels not all.equal >> > 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : >> > Lengths (255, 98) differ (string compare on first 98)98 string >> > mismatches >> > 4: In switch(class(x[[nm]])[[1]], factor = { : >> > data frame column 'geo_accession' levels not all.equal >> > 5: In alleq(levels(x[[nm]]), levels(y[[nm]])) : >> > Lengths (66, 21) differ (string compare on first 21)21 string >> > mismatches >> > 6: In switch(class(x[[nm]])[[1]], factor = { : >> > data frame column 'source_name_ch1' levels not all.equal >> > 7: In alleq(levels(x[[nm]]), levels(y[[nm]])) : >> > Lengths (255, 98) differ (string compare on first 98)98 string >> > mismatches >> > 8: In switch(class(x[[nm]])[[1]], factor = { : >> > data frame column 'description' levels not all.equal >> > 9: In alleq(levels(x[[nm]]), levels(y[[nm]])) : >> > Lengths (255, 98) differ (string compare on first 98)98 string >> > mismatches >> > 10: In switch(class(x[[nm]])[[1]], factor = { : >> > data frame column 'supplementary_file' levels not all.equal >> > >> >> traceback() >> > 9: stop("data.frames contain conflicting data:", "\n\tnon- conforming >> > colname(s): ", >> > paste(sharedCols[!ok], collapse = ", ")) >> > 8: .local(x, y, ...) >> > 7: combine(pDataX, pDataY) >> > 6: combine(pDataX, pDataY) >> > 5: .local(x, y, ...) >> > 4: combine(phenoData(x), phenoData(y)) >> > 3: combine(phenoData(x), phenoData(y)) >> > 2: combine(tmp[[1]], tmp[[2]]) >> > 1: combine(tmp[[1]], tmp[[2]]) >> > >> >> sessionInfo() >> > R version 2.6.0 (2007-10-03) >> > x86_64-unknown-linux-gnu >> > >> > locale: >> > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE= en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=e n_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US. UTF-8;LC_IDENTIFICATION=C >> > >> > attached base packages: >> > [1] tools stats graphics grDevices utils datasets methods >> > [8] base >> > >> > other attached packages: >> > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.2 >> > >> > loaded via a namespace (and not attached): >> > [1] rcompgen_0.1-15 >> > >> > Francois >> > >> > On Wed, 2008-01-30 at 10:03 -0800, Martin Morgan wrote: >> >> Hi Francois -- this might be related to a bug in Biobase that has been >> >> fixed. Can you try to update your Biobase, either biocLite('Biobase') >> >> or following the directions at http://bioconductor.org/download ? If >> >> not, can you provide the output of traceback() after the error occurs? >> >> >> >> Thanks, >> >> >> >> Martin >> >> >> >> Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: >> >> >> >> > Hi everyone, >> >> > >> >> > I'm getting an error message when trying to combine two parts of a GSE >> >> > object: >> >> > >> >> >>tmp<-getGEO('GSE3526',GSEMatrix=T) >> >> >> tmp2<-combine(tmp[[1]],tmp[[2]]) >> >> > Error in alleq(levels(x[[nm]]), levels(y[[nm]])) && alleq(x >> >> > [sharedRows, : >> >> > invalid 'x' type in 'x && y' >> >> > >> >> > Checking to make sure that I should be able to combine them (from the >> >> > eSet documentation): >> >> > >> >> > #eSets must have identical numbers of 'featureNames' >> >> >> all(featureNames(tmp[[2]])==featureNames(tmp[[2]])) >> >> > [1] TRUE >> >> > >> >> > #must have distinct 'sampleNames' >> >> >> any(sampleNames(tmp[[1]])%in%sampleNames(tmp[[2]])) >> >> > [1] FALSE >> >> > >> >> > #and must have identical 'annotation'. >> >> >> annotation(tmp[[2]])==annotation(tmp[[2]]) >> >> > [1] TRUE >> >> > >> >> >> sessionInfo() >> >> > R version 2.6.0 (2007-10-03) >> >> > x86_64-unknown-linux-gnu >> >> > >> >> > locale: >> >> > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLA TE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPE R=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_ US.UTF-8;LC_IDENTIFICATION=C >> >> > >> >> > attached base packages: >> >> > [1] tools stats graphics grDevices utils datasets methods >> >> > [8] base >> >> > >> >> > other attached packages: >> >> > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.0 >> >> > >> >> > loaded via a namespace (and not attached): >> >> > [1] rcompgen_0.1-15 >> >> > >> >> > Does anyone know why that is happening and if there would be any way >> >> > around it? >> >> > >> >> > Francois >> >> > >> >> > _______________________________________________ >> >> > Bioconductor mailing list >> >> > Bioconductor at stat.math.ethz.ch >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> > -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
On Jan 30, 2008 2:44 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: > > > Hi Martin, > > > > Thanks for the help. I managed to fix the issue by resetting all of the > > levels on both side (having everything as characters should work too): > > > > for (i in 1:length(pData(phenoData(tmp[[1]])))) > > levels(pData(phenoData(tmp[[1]]))[,i])<-levels(pData(phenoData(tmp > > [[2]]))[,i]) <- c(unique(as.character(pData(phenoData(tmp > > [[1]]))[,i])),unique(as.character(pData(phenoData(tmp[[2]]))[,i]))) > > > > The next question would be to see where it would best be taken care of. > > I really don't see why this should not be taken care of behind the > > scene. > > > > The two main options I see would be that getGEO() returns characters of > > phenoData instead of factors or having combine() know to deal with > > factors properly for expressionSet. I have thought about doing just this. However, downstream analyses based on ExpressionSets will probably rely on having factors for grouping, so I haven't done so. This could also be done at the level of the ExpressionSet by enforcing that all factors are converted to character on creation of new ExpressionSets. Again, I don't think this is an optimal solution. > combine does know how to deal with factors properly -- the levels are > different, so the columns (usually) can't be combined. But I > appreciate the sentiment, and the issue has come up on the mailing > list three times since 2.1, so is a common occurrence. I've tried some > more at making the documentation better, and will work on a better set > of warnings for the next release of Bioconductor. It seems like a compromise solution might be to generate warnings on differing factor levels, but to go ahead and rectify the differences within combine() by creating a new factor based on the combined character representations of the offending columns. Are there any intrinsic problems with doing that? This would maintain factor columns as factors, but allow combine to "do the right thing" with regard to those columns. And, in case "do the right thing" isn't the "right thing", warnings will be generated, alerting the user of the issue. The alternative is to ask the user to do all this manually. > > If the former is chosen, I think it would probably be worth adjusting > > the documentation about combine to mention this issue. As an unrelated > > note, the ExpressionSet documentation refers to the eSet's. Since eSet > > is going away at some point, that might be worth changing. > > Actually, 'eSet' is a class that 'ExpressionSet' extends; 'eSet' is > not going to away, and many of the data slots and methods on > ExpressionSet are inherited from eSet so it's appropriate to > reference the eSet documentation for these. The 'exprSet' class is no > longer supported. > > Thanks for your input, > > Martin > > > > Francois > > > > On Wed, 2008-01-30 at 10:54 -0800, Martin Morgan wrote: > >> So part of the bug fix was an attempt to make the error message more > >> informative, and it's not really clear that I've done that! > >> > >> The traceback makes it's clear that the problem is with the pData (and > >> not, for instance varMetadata or featureData) of the two arrays. > >> > >> Some hints are provided by the warnings, by the ?combine help page, > >> > >> 'combine(data.frame, data.frame)' Combines two 'data.frame' > >> objects so that the resulting 'data.frame' contains all rows > >> and columns of the original objects. Rows and columns in the > >> returned value are unique, that is, a row or column > >> represented in both arguments is represented only once in the > >> result. To perform this operation, 'combine' makes sure that > >> data in shared rows and columns is identical in the two > >> data.frames. Data diffrences in shared rows and columns cause > >> an error. 'combine' issues a warning when a column is a > >> 'factor' and the levels of the factor in the two > >> 'data.frame's are different; the returned value may be > >> recoded. > >> > >> and by the results of > >> > >> > example(combine) > >> > >> particularly the last lines which are trying to illustrate your > >> problem: > >> > >> combin> # y is converted to 'factor' with different levels > >> combin> x <- data.frame(x=1:5,y=letters[1:5], row.names=letters[1:5]) > >> > >> combin> y <- data.frame(z=3:7,y=letters[3:7], row.names=letters[3:7]) > >> > >> combin> try(combine(x,y)) > >> Error in combine(x, y) : data.frames contain conflicting data: > >> non-conforming colname(s): y > >> In addition: Warning messages: > >> 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 5 string mismatches > >> 2: In switch(class(x[[nm]])[[1]], factor = { : > >> data frame column 'y' levels not all.equal > >> > >> The data.frame column 'y' is a 'factor' (rather than character > >> vectors) and combine doesn't know how to resolve a column that has 'c' > >> encoded as level 3 of a factor with one that has 'c' encoded as level > >> 1. > >> > >> One solution is to enusre that columns that are really character > >> vectors are stored as such > >> > >> > x <- data.frame(x=1:5,y=I(letters[1:5]), row.names=letters[1:5]) > >> > y <- data.frame(z=3:7,y=I(letters[3:7]), row.names=letters[3:7]) > >> > combine(x,y) > >> x y z > >> a 1 a NA > >> b 2 b NA > >> c 3 c 3 > >> d 4 d 4 > >> e 5 e 5 > >> f NA f 6 > >> g NA g 7 > >> > >> or that factors have the same levels > >> > >> > y1 <- factor(letters[1:5], levels=letters[1:7]) > >> > y2 <- factor(letters[3:7], levels=letters[1:7]) > >> > x <- data.frame(x=1:5, y=y1, row.names=letters[1:5]) > >> > y <- data.frame(z=3:7, y=y2, row.names=letters[3:7]) > >> > combine(x,y) > >> x y z > >> a 1 a NA > >> b 2 b NA > >> c 3 c 3 > >> d 4 d 4 > >> e 5 e 5 > >> f NA f 6 > >> g NA g 7 > >> > >> Martin > >> > >> Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: > >> > >> > Hi Martin, > >> > > >> > I think it is related, as I now have a different error message along > >> > with a series of warnings. 255 and 98 refer to the number of samples in > >> > each ExpressionSet. 66 and 21 refer to the number of unique elements in > >> > source_name_ch1 in the phenodata. > >> > > >> >> tmp2<-combine(tmp[[1]],tmp[[2]]) > >> > Error in .local(x, y, ...) : > >> > data.frames contain conflicting data: > >> > non-conforming colname(s): title, geo_accession, > >> > source_name_ch1, description, supplementary_file > >> > In addition: Warning messages: > >> > 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > >> > Lengths (255, 98) differ (string compare on first 98)98 string > >> > mismatches > >> > 2: In switch(class(x[[nm]])[[1]], factor = { : > >> > data frame column 'title' levels not all.equal > >> > 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > >> > Lengths (255, 98) differ (string compare on first 98)98 string > >> > mismatches > >> > 4: In switch(class(x[[nm]])[[1]], factor = { : > >> > data frame column 'geo_accession' levels not all.equal > >> > 5: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > >> > Lengths (66, 21) differ (string compare on first 21)21 string > >> > mismatches > >> > 6: In switch(class(x[[nm]])[[1]], factor = { : > >> > data frame column 'source_name_ch1' levels not all.equal > >> > 7: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > >> > Lengths (255, 98) differ (string compare on first 98)98 string > >> > mismatches > >> > 8: In switch(class(x[[nm]])[[1]], factor = { : > >> > data frame column 'description' levels not all.equal > >> > 9: In alleq(levels(x[[nm]]), levels(y[[nm]])) : > >> > Lengths (255, 98) differ (string compare on first 98)98 string > >> > mismatches > >> > 10: In switch(class(x[[nm]])[[1]], factor = { : > >> > data frame column 'supplementary_file' levels not all.equal > >> > > >> >> traceback() > >> > 9: stop("data.frames contain conflicting data:", "\n\tnon- conforming > >> > colname(s): ", > >> > paste(sharedCols[!ok], collapse = ", ")) > >> > 8: .local(x, y, ...) > >> > 7: combine(pDataX, pDataY) > >> > 6: combine(pDataX, pDataY) > >> > 5: .local(x, y, ...) > >> > 4: combine(phenoData(x), phenoData(y)) > >> > 3: combine(phenoData(x), phenoData(y)) > >> > 2: combine(tmp[[1]], tmp[[2]]) > >> > 1: combine(tmp[[1]], tmp[[2]]) > >> > > >> >> sessionInfo() > >> > R version 2.6.0 (2007-10-03) > >> > x86_64-unknown-linux-gnu > >> > > >> > locale: > >> > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLAT E=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER =en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_U S.UTF-8;LC_IDENTIFICATION=C > >> > > >> > attached base packages: > >> > [1] tools stats graphics grDevices utils datasets methods > >> > [8] base > >> > > >> > other attached packages: > >> > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.2 > >> > > >> > loaded via a namespace (and not attached): > >> > [1] rcompgen_0.1-15 > >> > > >> > Francois > >> > > >> > On Wed, 2008-01-30 at 10:03 -0800, Martin Morgan wrote: > >> >> Hi Francois -- this might be related to a bug in Biobase that has been > >> >> fixed. Can you try to update your Biobase, either biocLite('Biobase') > >> >> or following the directions at http://bioconductor.org/download ? If > >> >> not, can you provide the output of traceback() after the error occurs? > >> >> > >> >> Thanks, > >> >> > >> >> Martin > >> >> > >> >> Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: > >> >> > >> >> > Hi everyone, > >> >> > > >> >> > I'm getting an error message when trying to combine two parts of a GSE > >> >> > object: > >> >> > > >> >> >>tmp<-getGEO('GSE3526',GSEMatrix=T) > >> >> >> tmp2<-combine(tmp[[1]],tmp[[2]]) > >> >> > Error in alleq(levels(x[[nm]]), levels(y[[nm]])) && alleq(x > >> >> > [sharedRows, : > >> >> > invalid 'x' type in 'x && y' > >> >> > > >> >> > Checking to make sure that I should be able to combine them (from the > >> >> > eSet documentation): > >> >> > > >> >> > #eSets must have identical numbers of 'featureNames' > >> >> >> all(featureNames(tmp[[2]])==featureNames(tmp[[2]])) > >> >> > [1] TRUE > >> >> > > >> >> > #must have distinct 'sampleNames' > >> >> >> any(sampleNames(tmp[[1]])%in%sampleNames(tmp[[2]])) > >> >> > [1] FALSE > >> >> > > >> >> > #and must have identical 'annotation'. > >> >> >> annotation(tmp[[2]])==annotation(tmp[[2]]) > >> >> > [1] TRUE > >> >> > > >> >> >> sessionInfo() > >> >> > R version 2.6.0 (2007-10-03) > >> >> > x86_64-unknown-linux-gnu > >> >> > > >> >> > locale: > >> >> > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COL LATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PA PER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=e n_US.UTF-8;LC_IDENTIFICATION=C > >> >> > > >> >> > attached base packages: > >> >> > [1] tools stats graphics grDevices utils datasets methods > >> >> > [8] base > >> >> > > >> >> > other attached packages: > >> >> > [1] GEOquery_2.2.0 RCurl_0.8-1 Biobase_1.16.0 > >> >> > > >> >> > loaded via a namespace (and not attached): > >> >> > [1] rcompgen_0.1-15 > >> >> > > >> >> > Does anyone know why that is happening and if there would be any way > >> >> > around it? > >> >> > > >> >> > Francois > >> >> > > >> >> > _______________________________________________ > >> >> > Bioconductor mailing list > >> >> > Bioconductor at stat.math.ethz.ch > >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >> > >> > > >> > > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M2 B169 > Phone: (206) 667-2793 > > _______________________________________________ > > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
> combine does know how to deal with factors properly -- the levels are > different, so the columns (usually) can't be combined. But I > appreciate the sentiment, and the issue has come up on the mailing > list three times since 2.1, so is a common occurrence. I've tried some > more at making the documentation better, and will work on a better set > of warnings for the next release of Bioconductor. I understand why it wouldn't really work in the general case of combining data.frames. But I do think that columns can be combined in this specific case, especially considering the requirement that the sample names be different. The ordering of the levels would have to be somewhat arbitrary though, so I do not know if a warning would be warranted in this case. It just seems strange that a perfectly legal eSet cannot use one of the basic functionality described in that class. > Actually, 'eSet' is a class that 'ExpressionSet' extends; 'eSet' is > not going to away, and many of the data slots and methods on > ExpressionSet are inherited from eSet so it's appropriate to > reference the eSet documentation for these. The 'exprSet' class is no > longer supported. Yes, of course. I keep on mixing them. Francois
ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
Thanks to both for your feedback. I'm not sure whether the current state represents an over-reaction to an earlier problem, or a more reasoned position. I'll look through my notes and, if appropriate, make the change to the way combine works. This will be in the development version of Biobase, and will not be implemented for a couple of weeks. In some ways the current behavior of data.frame (automatically making factors out of character strings) is the root of the problem, and to me argues against being too clever in trying to guess the user's intentions. Martin Francois Pepin <fpepin at="" cs.mcgill.ca=""> writes: >> combine does know how to deal with factors properly -- the levels are >> different, so the columns (usually) can't be combined. But I >> appreciate the sentiment, and the issue has come up on the mailing >> list three times since 2.1, so is a common occurrence. I've tried some >> more at making the documentation better, and will work on a better set >> of warnings for the next release of Bioconductor. > > I understand why it wouldn't really work in the general case of > combining data.frames. But I do think that columns can be combined in > this specific case, especially considering the requirement that the > sample names be different. The ordering of the levels would have to be > somewhat arbitrary though, so I do not know if a warning would be > warranted in this case. It just seems strange that a perfectly legal > eSet cannot use one of the basic functionality described in that class. > >> Actually, 'eSet' is a class that 'ExpressionSet' extends; 'eSet' is >> not going to away, and many of the data slots and methods on >> ExpressionSet are inherited from eSet so it's appropriate to >> reference the eSet documentation for these. The 'exprSet' class is no >> longer supported. > > Yes, of course. I keep on mixing them. > > Francois > -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6