mget() error with NA values
1
0
Entering edit mode
@christian-kohler-2698
Last seen 3.8 years ago
DeaR bioconductors, we run an internal microarray analysis pipeline and switched today from R/BioC (2.8.1/2.3) to 2.9/2.4. After running some test code, I came across the following error: testCode: > x<-rep(NA,10) > unique(unlist(mget(x, env=hgu133plus2ENTREZID,ifnotfound=NA))) when I run this code snippet with 2.8.1/2.3 the corresponding return value is > [1] NA but with 2.9/2.4 I got the following error: > Error during wrapup: keys must be supplied in a character vector with no NAs This causes our pipeline to break there and stop the analysis while in the previous case the analysis still continued with NA values. Please do not think that I am a picky person, but was there any urgent need to change the behaviour of mget()? Is it possible to somehow bypass this? Thanks a lot for any help. Christian -- Christian Kohler Institute of Functional Genomics Computational Diagnostics University of Regensburg (BioPark I) D-93147 Regensburg (Germany) Tel. +49 941 943 5055 Fax +49 941 943 5020 christian.kohler at klinik.uni-regensburg.de
Microarray Microarray • 1.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States
Hi Christian, Christian Kohler wrote: > DeaR bioconductors, > > we run an internal microarray analysis pipeline and switched today from > R/BioC (2.8.1/2.3) to 2.9/2.4. > After running some test code, I came across the following error: > > testCode: >> x<-rep(NA,10) >> unique(unlist(mget(x, env=hgu133plus2ENTREZID,ifnotfound=NA))) > > > when I run this code snippet with 2.8.1/2.3 the corresponding return > value is >> [1] NA Really? > x <- rep(NA, 10) > mget(x, hgu95av2ENTREZID) Error in .checkKeysAreWellFormed(keys) : keys must be supplied in a character vector with no NAs > sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] tools stats graphics grDevices datasets utils methods [8] base other attached packages: [1] hgu95av2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 [4] AnnotationDbi_1.4.3 Biobase_2.2.2 > > but with 2.9/2.4 I got the following error: >> Error during wrapup: keys must be supplied in a character vector with > no NAs > > This causes our pipeline to break there and stop the analysis while in > the previous case the analysis still continued with NA values. > > Please do not think that I am a picky person, but was there any urgent > need to change the behaviour of mget()? > Is it possible to somehow bypass this? The easiest way is to strip the NA values, using the canonical x <- x[!is.na(x)] Best, Jim > > > Thanks a lot for any help. > > Christian > > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT
0
Entering edit mode
I merged probe ids from affy hgu133a and b chips, then looked them up using mget(probelist, hgu133aSYMBOL) Then I tried the same lookup with hgu133bSYMBOL I expected a difference, since the chips contain fairly unique symbols. Are symbols unique to A or B known to both? Thanks. Tom
ADD REPLY
0
Entering edit mode
Hi Thomas, Gene symbols cannot be relied upon to be unique in any case. They are frequently "assigned" to multiple different genes. I might be better able to help you if you were a little bit more specific about what you are seeing. But what you should see is that these two platforms have mappings for the subset of the genes that they represent. So for example hgu133b has a mapping for probeset 229819_at to symbol A1BG. But the hgu133a chip does not have a probe that maps to this gene symbol. So that would be one example (at least) of a difference and there are many more. There may be some overlap for symbols caused in part by the fact that some probesets IDs will measure the same gene and also because gene symbols are horrible as identifiers but for the most part you should see different symbols on these platforms. Marc Thomas Hampton wrote: > I merged probe ids from affy hgu133a and b chips, then looked them > up using > > mget(probelist, hgu133aSYMBOL) > > Then I tried the same lookup with hgu133bSYMBOL > > I expected a difference, since the chips contain fairly unique symbols. > > Are symbols unique to A or B known to both? > > > Thanks. > > Tom > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Marc, Thanks for your reply. For a unique gene identifier, do you recommend ENTREZID as over SYMBOL? I am comparing three experiments on three platforms hgu95av2.db hgu133a hgu133a + b So what I am after is a nice common identifier for these chips. Thanks Tom On Apr 30, 2009, at 2:09 PM, Marc Carlson wrote: > Hi Thomas, > > Gene symbols cannot be relied upon to be unique in any case. They are > frequently "assigned" to multiple different genes. I might be better > able to help you if you were a little bit more specific about what you > are seeing. But what you should see is that these two platforms have > mappings for the subset of the genes that they represent. > > So for example hgu133b has a mapping for probeset 229819_at to symbol > A1BG. But the hgu133a chip does not have a probe that maps to this > gene > symbol. So that would be one example (at least) of a difference and > there are many more. There may be some overlap for symbols caused in > part by the fact that some probesets IDs will measure the same gene > and > also because gene symbols are horrible as identifiers but for the most > part you should see different symbols on these platforms. > > > Marc > > > > > > Thomas Hampton wrote: >> I merged probe ids from affy hgu133a and b chips, then looked them >> up using >> >> mget(probelist, hgu133aSYMBOL) >> >> Then I tried the same lookup with hgu133bSYMBOL >> >> I expected a difference, since the chips contain fairly unique >> symbols. >> >> Are symbols unique to A or B known to both? >> >> >> Thanks. >> >> Tom >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD REPLY
0
Entering edit mode
Hi Thomas, Entrez Gene IDs are a great alternative to symbols. They are not recycled. So if you meet the same entrez ID in another setting you can be assured that it refers to the same thing that it did before. In contrast, with gene symbols you have cases like the "VH" gene which is presently assigned to 36 different genes in humans! So if someone tells you that they work on the VH gene you have to ask them which one??? That sort of nonsense is just not helpful when doing informatics work. So yes, you should probably use Entrez Gene IDs. Marc Thomas Hampton wrote: > Marc, > > Thanks for your reply. > > For a unique gene identifier, do you recommend ENTREZID as over SYMBOL? > > I am comparing three experiments on three platforms > > hgu95av2.db > hgu133a > hgu133a + b > > So what I am after is a nice common identifier for these chips. > > Thanks > > Tom > On Apr 30, 2009, at 2:09 PM, Marc Carlson wrote: > >> Hi Thomas, >> >> Gene symbols cannot be relied upon to be unique in any case. They are >> frequently "assigned" to multiple different genes. I might be better >> able to help you if you were a little bit more specific about what you >> are seeing. But what you should see is that these two platforms have >> mappings for the subset of the genes that they represent. >> >> So for example hgu133b has a mapping for probeset 229819_at to symbol >> A1BG. But the hgu133a chip does not have a probe that maps to this gene >> symbol. So that would be one example (at least) of a difference and >> there are many more. There may be some overlap for symbols caused in >> part by the fact that some probesets IDs will measure the same gene and >> also because gene symbols are horrible as identifiers but for the most >> part you should see different symbols on these platforms. >> >> >> Marc >> >> >> >> >> >> Thomas Hampton wrote: >>> I merged probe ids from affy hgu133a and b chips, then looked them >>> up using >>> >>> mget(probelist, hgu133aSYMBOL) >>> >>> Then I tried the same lookup with hgu133bSYMBOL >>> >>> I expected a difference, since the chips contain fairly unique symbols. >>> >>> Are symbols unique to A or B known to both? >>> >>> >>> Thanks. >>> >>> Tom >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > >
ADD REPLY
0
Entering edit mode
James W. MacDonald wrote: > Hi Christian, > > Christian Kohler wrote: >> DeaR bioconductors, >> >> we run an internal microarray analysis pipeline and switched today from >> R/BioC (2.8.1/2.3) to 2.9/2.4. >> After running some test code, I came across the following error: >> >> testCode: >>> x<-rep(NA,10) >>> unique(unlist(mget(x, env=hgu133plus2ENTREZID,ifnotfound=NA))) >> >> >> when I run this code snippet with 2.8.1/2.3 the corresponding return >> value is >>> [1] NA > > Really? > > > x <- rep(NA, 10) > > mget(x, hgu95av2ENTREZID) > Error in .checkKeysAreWellFormed(keys) : > keys must be supplied in a character vector with no NAs > > sessionInfo() > R version 2.8.1 (2008-12-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] tools stats graphics grDevices datasets utils methods > [8] base > > other attached packages: > [1] hgu95av2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 > [4] AnnotationDbi_1.4.3 Biobase_2.2.2 > > >> >> but with 2.9/2.4 I got the following error: >>> Error during wrapup: keys must be supplied in a character vector with >> no NAs >> >> This causes our pipeline to break there and stop the analysis while in >> the previous case the analysis still continued with NA values. >> >> Please do not think that I am a picky person, but was there any urgent >> need to change the behaviour of mget()? >> Is it possible to somehow bypass this? > > The easiest way is to strip the NA values, using the canonical > > x <- x[!is.na(x)] > > Best, > > Jim > > >> >> >> Thanks a lot for any help. >> >> Christian >> >> >> > Hi Jim, thanks so much for your quick reply, but to be honest I still do not understand, why my function-call ( unique(unlist(mget(x, env=hgu133plus2ENTREZID,ifnotfound=NA))) ) produces 'NA' instead of the error-message above. The interesting thing is, that if I analyse exactly the same data with 2.3 as well as with 2.4, the analysis does not break with 2.3 but with 2.4 !?. Well, I guess the solution is somewhat simple :-) All the best, Christian -- Christian Kohler Institute of Functional Genomics Computational Diagnostics University of Regensburg (BioPark I) D-93147 Regensburg (Germany) Tel. +49 941 943 5055 Fax +49 941 943 5020 christian.kohler at klinik.uni-regensburg.de
ADD REPLY
0
Entering edit mode
Hi Christian, Christian Kohler wrote: [...] > thanks so much for your quick reply, but to be honest I still do not > understand, why my function-call ( unique(unlist(mget(x, > env=hgu133plus2ENTREZID,ifnotfound=NA))) ) produces 'NA' instead of the > error-message above. Without having your sessionInfo(), we won't be able to tell either... > > The interesting thing is, that if I analyse exactly the same data with > 2.3 as well as with 2.4, the analysis does not break with 2.3 but with > 2.4 !?. > > Well, I guess the solution is somewhat simple :-) If that means you are going to stick with 2.3 then yes, it's a simple solution, but please note that 2.3 is not supported anymore and that the annotations in 2.4 are much more recent and supposedly more accurate. The small code modification suggested by Jim is really straightforward and a better way to go IMO. And as an extra benefit, other people will be able to run your pipeline and reproduce your results (the current code is expected to break for anybody with a standard installation). Cheers, H. > > All the best, > Christian > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6