Problems selecting rows from dataframe (exprs) of GNF Atlas data....
2
0
Entering edit mode
Bas Jansen ▴ 150
@bas-jansen-2966
Last seen 9.6 years ago
Hi Axel, hi Sebastian: Thanks for the cookie, Axel. Anyway, I have done the following: > exprs <- as.dataframe(exprs(eset)) > rownames(exprs) [1] "200000_s_at" "200001_at" [3] "200002_at" "200003_s_at" [5] "200004_at" "200005_at" [7] "200006_at" "200007_at" [9] "200008_s_at" "200009_at" [11] "200010_at" "200011_s_at" [13] "200012_x_at" "200013_at" [15] "200014_s_at" "200015_s_at" [17] "200016_x_at" "200017_at" etc. So I would argue that the 'numbers' are recognized as rownames here, but I cannot select them as indicated in a previous email. Strange, isn't it? I still need to try Sebastian's suggestions though, so let's not run off the cliff just yet. Below the sessionInfo. Kind regards, Bas > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] fortunes_1.4-2 Biobase_2.14.0 loaded via a namespace (and not attached): [1] tcltk_2.14.0 tools_2.14.0 On Tue, Jan 3, 2012 at 2:33 PM, <axel.klenk at="" actelion.com=""> wrote: > Dear Bas, > > I think you'll need to show us your original code, in particular what your > 'exprs' is > and how you have obtained it. If you have "extracted the expression > values" from > an ExpressionSet ES like > > x <- exprs(ES) > > then x is a matrix and not a data.frame -- but then your output would look > slightly > different. If you have done something like > > x <- data.frame(exprs(ES)) > > I can reproduce your output, including rows that are all NA -- for > rownames that > do not exist. > > So: how did you create 'exprs' and are you sure your rownames are ok? > > Cheers, > > ?- axel > > > BTW: try > > install.packages("fortunes") > library("fortunes") > fortune("dog") > > to see why 'exprs' may not be a good name for your object... :-) > > > > Axel Klenk > Research Informatician > Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / > Switzerland > > > > > From: > Bas Jansen <bjhjansen at="" gmail.com=""> > To: > Sebastian Thieme <thieme at="" mi.fu-berlin.de=""> > Cc: > bioconductor at r-project.org > Date: > 03.01.2012 13:48 > Subject: > Re: [BioC] Problems selecting rows from dataframe (exprs) of GNF Atlas > data.... > Sent by: > bioconductor-bounces at r-project.org > > > > Dear Sebastian: > > Thanks for your swift reply. It works, but only for the probe ID that > start with a character (only ~15 out of the > 100 probe IDs I want to > investigate). Those that start with a number report back with "<0 > rows> (or 0-length row.names)". The motto for the New Year seems to be > 'Solve a problem, only to find new ones'. Phew. > > Kind regards, > Bas > > On Tue, Jan 3, 2012 at 11:19 AM, Sebastian Thieme > <thieme at="" mi.fu-berlin.de=""> wrote: >> Hello, >> >> happy new year too =) >> >> you can use exprs[ rownames(exprs) %in% "gnf1h00499_at",] or exprs[ >> rownames(exprs) %in% vectorOfNames,], where vectorOfNames is a list or >> a vector of the names you are looking for. Important is that the >> object you are search in has to be the first argument. If you want >> requesting a high number of names use lists instead of dataframes. >> >> best >> >> Basti >> >> 2012/1/3 Bas Jansen <bjhjansen at="" gmail.com="">: >>> Dear fellow Bioconductor users: >>> >>> Happy New Year! >>> At the moment I am analyzing the GNF Atlas data. I retrieved the data >>> from the Gene Expression Omnibus using the package GEOquery, converted >>> it to an expressionSet and extracted the expression values. So now I >>> have a data frame from which I would like to extract the expression >>> values of > 100 probe IDs for 79 tissues. Thing is, if I use a single >>> probe ID, things go fine. However, whenever I use a string of probe >>> IDs, things go awry. >>> >>> See below: >>> >>> *** >>>> exprs[c("gnf1h00499_at"),] >>> ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 > GSM18774 >>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 > 4.472488 >>> (abbreviated for reasons of clarity) >>> *** >>> >>> As stated above: whenever I use a string of probe IDs (say, like 2 >>> probe IDs), things go awry: >>> >>> *** >>>> exprs[c("gnf1h00499_at","gnf1h500_at"),] >>> ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 > GSM18774 >>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 > 4.472488 >>> NA ? ? ? ? ? ? ? ? ?NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA > ?NA >>> etc. >>> *** >>> >>> The gnf1h00500 probe is reported as NA, and I'm pretty sure it has >>> real expression values associated with it. >>> The following just works fine: >>> >>> *** >>>> exprs[c(1:20,30:70),] >>> ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 > GSM18774 >>> 200000_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > ?0 >>> 200001_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > ?0 >>> 200002_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > ?0 >>> 200003_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > ?0 >>> etc. >>> *** >>> >>> So, how do I select rows on the basis of probe IDs? Or better yet: >>> what am I overlooking???? >>> >>> Thanks & kind regards, >>> Bas >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. > It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. > The content of this email is not legally binding unless confirmed by letter. > Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com >
GO probe GEOquery GO probe GEOquery • 1.3k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Tue, Jan 3, 2012 at 9:37 AM, Bas Jansen <bjhjansen@gmail.com> wrote: > Hi Axel, hi Sebastian: > > Thanks for the cookie, Axel. Anyway, I have done the following: > > > exprs <- as.dataframe(exprs(eset)) > > rownames(exprs) > [1] "200000_s_at" "200001_at" > [3] "200002_at" "200003_s_at" > [5] "200004_at" "200005_at" > [7] "200006_at" "200007_at" > [9] "200008_s_at" "200009_at" > [11] "200010_at" "200011_s_at" > [13] "200012_x_at" "200013_at" > [15] "200014_s_at" "200015_s_at" > [17] "200016_x_at" "200017_at" > etc. > > Hi, Bas. These are recognized as rownames, yes. However, if you look at the original data from GEO, you will see that these all have "null" for the value; these "null" values become NAs in R. So, if you are concerned about rows of NAs when selecting these rownames, you should not be, as this is the correct result. See my note below about your original question, also. > So I would argue that the 'numbers' are recognized as rownames here, > but I cannot select them as indicated in a previous email. Strange, > isn't it? > I still need to try Sebastian's suggestions though, so let's not run > off the cliff just yet. Below the sessionInfo. > > Kind regards, > Bas > > > sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C/UTF-8/C/C/C/C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] fortunes_1.4-2 Biobase_2.14.0 > > loaded via a namespace (and not attached): > [1] tcltk_2.14.0 tools_2.14.0 > > > On Tue, Jan 3, 2012 at 2:33 PM, <axel.klenk@actelion.com> wrote: > > Dear Bas, > > > > I think you'll need to show us your original code, in particular what > your > > 'exprs' is > > and how you have obtained it. If you have "extracted the expression > > values" from > > an ExpressionSet ES like > > > > x <- exprs(ES) > > > > then x is a matrix and not a data.frame -- but then your output would > look > > slightly > > different. If you have done something like > > > > x <- data.frame(exprs(ES)) > > > > I can reproduce your output, including rows that are all NA -- for > > rownames that > > do not exist. > > > > So: how did you create 'exprs' and are you sure your rownames are ok? > > > > Cheers, > > > > - axel > > > > > > BTW: try > > > > install.packages("fortunes") > > library("fortunes") > > fortune("dog") > > > > to see why 'exprs' may not be a good name for your object... :-) > > > > > > > > Axel Klenk > > Research Informatician > > Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / > > Switzerland > > > > > > > > > > From: > > Bas Jansen <bjhjansen@gmail.com> > > To: > > Sebastian Thieme <thieme@mi.fu-berlin.de> > > Cc: > > bioconductor@r-project.org > > Date: > > 03.01.2012 13:48 > > Subject: > > Re: [BioC] Problems selecting rows from dataframe (exprs) of GNF Atlas > > data.... > > Sent by: > > bioconductor-bounces@r-project.org > > > > > > > > Dear Sebastian: > > > > Thanks for your swift reply. It works, but only for the probe ID that > > start with a character (only ~15 out of the > 100 probe IDs I want to > > investigate). Those that start with a number report back with "<0 > > rows> (or 0-length row.names)". The motto for the New Year seems to be > > 'Solve a problem, only to find new ones'. Phew. > > > > Kind regards, > > Bas > > > > On Tue, Jan 3, 2012 at 11:19 AM, Sebastian Thieme > > <thieme@mi.fu-berlin.de> wrote: > >> Hello, > >> > >> happy new year too =) > >> > >> you can use exprs[ rownames(exprs) %in% "gnf1h00499_at",] or exprs[ > >> rownames(exprs) %in% vectorOfNames,], where vectorOfNames is a list or > >> a vector of the names you are looking for. Important is that the > >> object you are search in has to be the first argument. If you want > >> requesting a high number of names use lists instead of dataframes. > >> > >> best > >> > >> Basti > >> > >> 2012/1/3 Bas Jansen <bjhjansen@gmail.com>: > >>> Dear fellow Bioconductor users: > >>> > >>> Happy New Year! > >>> At the moment I am analyzing the GNF Atlas data. I retrieved the data > >>> from the Gene Expression Omnibus using the package GEOquery, converted > >>> it to an expressionSet and extracted the expression values. So now I > >>> have a data frame from which I would like to extract the expression > >>> values of > 100 probe IDs for 79 tissues. Thing is, if I use a single > >>> probe ID, things go fine. However, whenever I use a string of probe > >>> IDs, things go awry. > >>> > >>> See below: > >>> > >>> *** > >>>> exprs[c("gnf1h00499_at"),] > >>> GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 > > GSM18774 > >>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 > > 4.472488 > >>> (abbreviated for reasons of clarity) > >>> *** > >>> > >>> As stated above: whenever I use a string of probe IDs (say, like 2 > >>> probe IDs), things go awry: > >>> > >>> *** > >>>> exprs[c("gnf1h00499_at","gnf1h500_at"),] > >>> GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 > > GSM18774 > >>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 > > 4.472488 > >>> NA NA NA NA NA NA NA > > NA > >>> etc. > >>> *** > >>> > >>> The gnf1h00500 probe is reported as NA, and I'm pretty sure it has > >>> real expression values associated with it. > Yes, the gnf1h00500_at probeset and rowname will work fine. However, your code used "gnf1h500_at" and NOT "gnf1h00500_at". The latter works fine for me. Sean > >>> The following just works fine: > >>> > >>> *** > >>>> exprs[c(1:20,30:70),] > >>> GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 > > GSM18774 > >>> 200000_s_at 0 0 0 0 0 0 > > 0 > >>> 200001_at 0 0 0 0 0 0 > > 0 > >>> 200002_at 0 0 0 0 0 0 > > 0 > >>> 200003_s_at 0 0 0 0 0 0 > > 0 > >>> etc. > >>> *** > >>> > >>> So, how do I select rows on the basis of probe IDs? Or better yet: > >>> what am I overlooking???? > >>> > >>> Thanks & kind regards, > >>> Bas > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > The information of this email and in any file transmitted with it is > strictly confidential and may be legally privileged. > > It is intended solely for the addressee. If you are not the intended > recipient, any copying, distribution or any other use of this email is > prohibited and may be unlawful. In such case, you should please notify the > sender immediately and destroy this email. > > The content of this email is not legally binding unless confirmed by > letter. > > Any views expressed in this message are those of the individual sender, > except where the message states otherwise and the sender is authorised to > state them to be the views of the sender's company. For further information > about Actelion please see our website at http://www.actelion.com > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Sean (and Axel and Sebastian!): My apologies for the typo, your were right. That said, I'm now pretty confident I did the right thing all along, but I have been fooled by the article, and a bit confused by the GEO entries associated with it (GSE vs GDS etc). From the PNAS study (see: A gene atlas of the mouse and human protein-encoding transcriptomes. Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB. Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7.) it is clear that both Affy HG-U133A and the custom array GNF1H have been used. The probesets that have been mapped in the UCSC Genome Browser include *all* data, of both platforms. Of course, the list of probes I have has been derived from that source. Anyway, at first I assumed that they used a 'hybrid' array, consisting of both the HG-U133A and their own probe sets, and called them collectively GNF1H. I have now figured out that this is not the case, and that I have been looking for HG-U133A probesets in their custom arrays. I now have analyzed only half of the data. My bad, and I apologize for wasting your time. Kind regards, Bas On Tue, Jan 3, 2012 at 3:55 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > > > On Tue, Jan 3, 2012 at 9:37 AM, Bas Jansen <bjhjansen at="" gmail.com=""> wrote: >> >> Hi Axel, hi Sebastian: >> >> Thanks for the cookie, Axel. Anyway, I have done the following: >> >> > exprs <- as.dataframe(exprs(eset)) >> > rownames(exprs) >> ? ?[1] "200000_s_at" ? ? ? ? ? ? ? ? "200001_at" >> ? ?[3] "200002_at" ? ? ? ? ? ? ? ? ? "200003_s_at" >> ? ?[5] "200004_at" ? ? ? ? ? ? ? ? ? "200005_at" >> ? ?[7] "200006_at" ? ? ? ? ? ? ? ? ? "200007_at" >> ? ?[9] "200008_s_at" ? ? ? ? ? ? ? ? "200009_at" >> ? [11] "200010_at" ? ? ? ? ? ? ? ? ? "200011_s_at" >> ? [13] "200012_x_at" ? ? ? ? ? ? ? ? "200013_at" >> ? [15] "200014_s_at" ? ? ? ? ? ? ? ? "200015_s_at" >> ? [17] "200016_x_at" ? ? ? ? ? ? ? ? "200017_at" >> etc. >> > > Hi, Bas. > > These are recognized as rownames, yes. ?However, if you look at the original > data from GEO, you will see that these all have "null" for the value; these > "null" values become NAs in R. ?So, if you are concerned about rows of NAs > when selecting these rownames, you should not be, as this is the correct > result. > > See my note below about your original question, also. > >> >> So I would argue that the 'numbers' are recognized as rownames here, >> but I cannot select them as indicated in a previous email. Strange, >> isn't it? >> I still need to try Sebastian's suggestions though, so let's not run >> off the cliff just yet. Below the sessionInfo. >> >> Kind regards, >> Bas >> >> > sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C/UTF-8/C/C/C/C >> >> attached base packages: >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> other attached packages: >> [1] fortunes_1.4-2 Biobase_2.14.0 >> >> loaded via a namespace (and not attached): >> [1] tcltk_2.14.0 tools_2.14.0 >> >> >> On Tue, Jan 3, 2012 at 2:33 PM, ?<axel.klenk at="" actelion.com=""> wrote: >> > Dear Bas, >> > >> > I think you'll need to show us your original code, in particular what >> > your >> > 'exprs' is >> > and how you have obtained it. If you have "extracted the expression >> > values" from >> > an ExpressionSet ES like >> > >> > x <- exprs(ES) >> > >> > then x is a matrix and not a data.frame -- but then your output would >> > look >> > slightly >> > different. If you have done something like >> > >> > x <- data.frame(exprs(ES)) >> > >> > I can reproduce your output, including rows that are all NA -- for >> > rownames that >> > do not exist. >> > >> > So: how did you create 'exprs' and are you sure your rownames are ok? >> > >> > Cheers, >> > >> > ?- axel >> > >> > >> > BTW: try >> > >> > install.packages("fortunes") >> > library("fortunes") >> > fortune("dog") >> > >> > to see why 'exprs' may not be a good name for your object... :-) >> > >> > >> > >> > Axel Klenk >> > Research Informatician >> > Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / >> > Switzerland >> > >> > >> > >> > >> > From: >> > Bas Jansen <bjhjansen at="" gmail.com=""> >> > To: >> > Sebastian Thieme <thieme at="" mi.fu-berlin.de=""> >> > Cc: >> > bioconductor at r-project.org >> > Date: >> > 03.01.2012 13:48 >> > Subject: >> > Re: [BioC] Problems selecting rows from dataframe (exprs) of GNF Atlas >> > data.... >> > Sent by: >> > bioconductor-bounces at r-project.org >> > >> > >> > >> > Dear Sebastian: >> > >> > Thanks for your swift reply. It works, but only for the probe ID that >> > start with a character (only ~15 out of the > 100 probe IDs I want to >> > investigate). Those that start with a number report back with "<0 >> > rows> (or 0-length row.names)". The motto for the New Year seems to be >> > 'Solve a problem, only to find new ones'. Phew. >> > >> > Kind regards, >> > Bas >> > >> > On Tue, Jan 3, 2012 at 11:19 AM, Sebastian Thieme >> > <thieme at="" mi.fu-berlin.de=""> wrote: >> >> Hello, >> >> >> >> happy new year too =) >> >> >> >> you can use exprs[ rownames(exprs) %in% "gnf1h00499_at",] or exprs[ >> >> rownames(exprs) %in% vectorOfNames,], where vectorOfNames is a list or >> >> a vector of the names you are looking for. Important is that the >> >> object you are search in has to be the first argument. If you want >> >> requesting a high number of names use lists instead of dataframes. >> >> >> >> best >> >> >> >> Basti >> >> >> >> 2012/1/3 Bas Jansen <bjhjansen at="" gmail.com="">: >> >>> Dear fellow Bioconductor users: >> >>> >> >>> Happy New Year! >> >>> At the moment I am analyzing the GNF Atlas data. I retrieved the data >> >>> from the Gene Expression Omnibus using the package GEOquery, converted >> >>> it to an expressionSet and extracted the expression values. So now I >> >>> have a data frame from which I would like to extract the expression >> >>> values of > 100 probe IDs for 79 tissues. Thing is, if I use a single >> >>> probe ID, things go fine. However, whenever I use a string of probe >> >>> IDs, things go awry. >> >>> >> >>> See below: >> >>> >> >>> *** >> >>>> exprs[c("gnf1h00499_at"),] >> >>> ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 >> > GSM18774 >> >>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 >> > 4.472488 >> >>> (abbreviated for reasons of clarity) >> >>> *** >> >>> >> >>> As stated above: whenever I use a string of probe IDs (say, like 2 >> >>> probe IDs), things go awry: >> >>> >> >>> *** >> >>>> exprs[c("gnf1h00499_at","gnf1h500_at"),] >> >>> ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 >> > GSM18774 >> >>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 >> > 4.472488 >> >>> NA ? ? ? ? ? ? ? ? ?NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA >> > ?NA >> >>> etc. >> >>> *** >> >>> >> >>> The gnf1h00500 probe is reported as NA, and I'm pretty sure it has >> >>> real expression values associated with it. > > > Yes, the gnf1h00500_at probeset and rowname will work fine. ?However, your > code used "gnf1h500_at" and NOT "gnf1h00500_at". ?The latter works fine for > me. > > Sean > >> >> >>> The following just works fine: >> >>> >> >>> *** >> >>>> exprs[c(1:20,30:70),] >> >>> ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 >> > GSM18774 >> >>> 200000_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> > ?0 >> >>> 200001_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> > ?0 >> >>> 200002_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> > ?0 >> >>> 200003_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> > ?0 >> >>> etc. >> >>> *** >> >>> >> >>> So, how do I select rows on the basis of probe IDs? Or better yet: >> >>> what am I overlooking???? >> >>> >> >>> Thanks & kind regards, >> >>> Bas >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor at r-project.org >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> > >> > >> > The information of this email and in any file transmitted with it is >> > strictly confidential and may be legally privileged. >> > It is intended solely for the addressee. If you are not the intended >> > recipient, any copying, distribution or any other use of this email is >> > prohibited and may be unlawful. In such case, you should please notify the >> > sender immediately and destroy this email. >> > The content of this email is not legally binding unless confirmed by >> > letter. >> > Any views expressed in this message are those of the individual sender, >> > except where the message states otherwise and the sender is authorised to >> > state them to be the views of the sender's company. For further information >> > about Actelion please see our website at http://www.actelion.com >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
@sebastian-thieme-5020
Last seen 9.6 years ago
Hi Bas, the numbers are not the rownames. if you use rownames(exprs) you get a list of the names your rows have so the numbers are only the position within the list. If you do something like exprs <- as.data.frame(exprs(eset)) x <- rownames(exprs) exprs[x %in% x,] you should get the entire dataframe. If you try namesYouWant <- c("200000_s_at","200001_at","200002_at","200003_s_at") exprs[x %in% namesYouWant ,] you will get only the rows for the selected names. x %in% namesYouWant will give you a boolean vector, so if the respektive name within the vector is in your dataframe you get a TRUE at the position where the name occurs in the data.frame otherwise you get a FALSE and no result for it. e.g. a <- c("a","b","c") > a %in% "a" [1] TRUE FALSE FALSE a %in% c("a","c") [1] TRUE FALSE TRUE I hope this helps Best Basti 2012/1/3 Bas Jansen <bjhjansen at="" gmail.com="">: > Hi Axel, hi Sebastian: > > Thanks for the cookie, Axel. Anyway, I have done the following: > >> exprs <- as.dataframe(exprs(eset)) >> rownames(exprs) > ? ?[1] "200000_s_at" ? ? ? ? ? ? ? ? "200001_at" > ? ?[3] "200002_at" ? ? ? ? ? ? ? ? ? "200003_s_at" > ? ?[5] "200004_at" ? ? ? ? ? ? ? ? ? "200005_at" > ? ?[7] "200006_at" ? ? ? ? ? ? ? ? ? "200007_at" > ? ?[9] "200008_s_at" ? ? ? ? ? ? ? ? "200009_at" > ? [11] "200010_at" ? ? ? ? ? ? ? ? ? "200011_s_at" > ? [13] "200012_x_at" ? ? ? ? ? ? ? ? "200013_at" > ? [15] "200014_s_at" ? ? ? ? ? ? ? ? "200015_s_at" > ? [17] "200016_x_at" ? ? ? ? ? ? ? ? "200017_at" > etc. > > So I would argue that the 'numbers' are recognized as rownames here, > but I cannot select them as indicated in a previous email. Strange, > isn't it? > I still need to try Sebastian's suggestions though, so let's not run > off the cliff just yet. Below the sessionInfo. > > Kind regards, > Bas > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C/UTF-8/C/C/C/C > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] fortunes_1.4-2 Biobase_2.14.0 > > loaded via a namespace (and not attached): > [1] tcltk_2.14.0 tools_2.14.0 > > > On Tue, Jan 3, 2012 at 2:33 PM, ?<axel.klenk at="" actelion.com=""> wrote: >> Dear Bas, >> >> I think you'll need to show us your original code, in particular what your >> 'exprs' is >> and how you have obtained it. If you have "extracted the expression >> values" from >> an ExpressionSet ES like >> >> x <- exprs(ES) >> >> then x is a matrix and not a data.frame -- but then your output would look >> slightly >> different. If you have done something like >> >> x <- data.frame(exprs(ES)) >> >> I can reproduce your output, including rows that are all NA -- for >> rownames that >> do not exist. >> >> So: how did you create 'exprs' and are you sure your rownames are ok? >> >> Cheers, >> >> ?- axel >> >> >> BTW: try >> >> install.packages("fortunes") >> library("fortunes") >> fortune("dog") >> >> to see why 'exprs' may not be a good name for your object... :-) >> >> >> >> Axel Klenk >> Research Informatician >> Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / >> Switzerland >> >> >> >> >> From: >> Bas Jansen <bjhjansen at="" gmail.com=""> >> To: >> Sebastian Thieme <thieme at="" mi.fu-berlin.de=""> >> Cc: >> bioconductor at r-project.org >> Date: >> 03.01.2012 13:48 >> Subject: >> Re: [BioC] Problems selecting rows from dataframe (exprs) of GNF Atlas >> data.... >> Sent by: >> bioconductor-bounces at r-project.org >> >> >> >> Dear Sebastian: >> >> Thanks for your swift reply. It works, but only for the probe ID that >> start with a character (only ~15 out of the > 100 probe IDs I want to >> investigate). Those that start with a number report back with "<0 >> rows> (or 0-length row.names)". The motto for the New Year seems to be >> 'Solve a problem, only to find new ones'. Phew. >> >> Kind regards, >> Bas >> >> On Tue, Jan 3, 2012 at 11:19 AM, Sebastian Thieme >> <thieme at="" mi.fu-berlin.de=""> wrote: >>> Hello, >>> >>> happy new year too =) >>> >>> you can use exprs[ rownames(exprs) %in% "gnf1h00499_at",] or exprs[ >>> rownames(exprs) %in% vectorOfNames,], where vectorOfNames is a list or >>> a vector of the names you are looking for. Important is that the >>> object you are search in has to be the first argument. If you want >>> requesting a high number of names use lists instead of dataframes. >>> >>> best >>> >>> Basti >>> >>> 2012/1/3 Bas Jansen <bjhjansen at="" gmail.com="">: >>>> Dear fellow Bioconductor users: >>>> >>>> Happy New Year! >>>> At the moment I am analyzing the GNF Atlas data. I retrieved the data >>>> from the Gene Expression Omnibus using the package GEOquery, converted >>>> it to an expressionSet and extracted the expression values. So now I >>>> have a data frame from which I would like to extract the expression >>>> values of > 100 probe IDs for 79 tissues. Thing is, if I use a single >>>> probe ID, things go fine. However, whenever I use a string of probe >>>> IDs, things go awry. >>>> >>>> See below: >>>> >>>> *** >>>>> exprs[c("gnf1h00499_at"),] >>>> ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 >> GSM18774 >>>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 >> 4.472488 >>>> (abbreviated for reasons of clarity) >>>> *** >>>> >>>> As stated above: whenever I use a string of probe IDs (say, like 2 >>>> probe IDs), things go awry: >>>> >>>> *** >>>>> exprs[c("gnf1h00499_at","gnf1h500_at"),] >>>> ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 >> GSM18774 >>>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 >> 4.472488 >>>> NA ? ? ? ? ? ? ? ? ?NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA >> ?NA >>>> etc. >>>> *** >>>> >>>> The gnf1h00500 probe is reported as NA, and I'm pretty sure it has >>>> real expression values associated with it. >>>> The following just works fine: >>>> >>>> *** >>>>> exprs[c(1:20,30:70),] >>>> ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 >> GSM18774 >>>> 200000_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> ?0 >>>> 200001_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> ?0 >>>> 200002_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> ?0 >>>> 200003_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> ?0 >>>> etc. >>>> *** >>>> >>>> So, how do I select rows on the basis of probe IDs? Or better yet: >>>> what am I overlooking???? >>>> >>>> Thanks & kind regards, >>>> Bas >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. >> It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. >> The content of this email is not legally binding unless confirmed by letter. >> Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com >>
ADD COMMENT

Login before adding your answer.

Traffic: 856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6