Problems selecting rows from dataframe (exprs) of GNF Atlas data....
1
0
Entering edit mode
Bas Jansen ▴ 150
@bas-jansen-2966
Last seen 8.3 years ago
Dear fellow Bioconductor users: Happy New Year! At the moment I am analyzing the GNF Atlas data. I retrieved the data from the Gene Expression Omnibus using the package GEOquery, converted it to an expressionSet and extracted the expression values. So now I have a data frame from which I would like to extract the expression values of > 100 probe IDs for 79 tissues. Thing is, if I use a single probe ID, things go fine. However, whenever I use a string of probe IDs, things go awry. See below: *** > exprs[c("gnf1h00499_at"),] GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 GSM18774 gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 4.472488 (abbreviated for reasons of clarity) *** As stated above: whenever I use a string of probe IDs (say, like 2 probe IDs), things go awry: *** > exprs[c("gnf1h00499_at","gnf1h500_at"),] GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 GSM18774 gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 4.472488 NA NA NA NA NA NA NA NA etc. *** The gnf1h00500 probe is reported as NA, and I'm pretty sure it has real expression values associated with it. The following just works fine: *** > exprs[c(1:20,30:70),] GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 GSM18774 200000_s_at 0 0 0 0 0 0 0 200001_at 0 0 0 0 0 0 0 200002_at 0 0 0 0 0 0 0 200003_s_at 0 0 0 0 0 0 0 etc. *** So, how do I select rows on the basis of probe IDs? Or better yet: what am I overlooking???? Thanks & kind regards, Bas
GO probe GEOquery GO probe GEOquery • 867 views
0
Entering edit mode
@sebastian-thieme-5020
Last seen 8.3 years ago
Hello, happy new year too =) you can use exprs[ rownames(exprs) %in% "gnf1h00499_at",] or exprs[ rownames(exprs) %in% vectorOfNames,], where vectorOfNames is a list or a vector of the names you are looking for. Important is that the object you are search in has to be the first argument. If you want requesting a high number of names use lists instead of dataframes. best Basti 2012/1/3 Bas Jansen <bjhjansen at="" gmail.com="">: > Dear fellow Bioconductor users: > > Happy New Year! > At the moment I am analyzing the GNF Atlas data. I retrieved the data > from the Gene Expression Omnibus using the package GEOquery, converted > it to an expressionSet and extracted the expression values. So now I > have a data frame from which I would like to extract the expression > values of > 100 probe IDs for 79 tissues. Thing is, if I use a single > probe ID, things go fine. However, whenever I use a string of probe > IDs, things go awry. > > See below: > > *** >> exprs[c("gnf1h00499_at"),] > ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 GSM18774 > gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 4.472488 > (abbreviated for reasons of clarity) > *** > > As stated above: whenever I use a string of probe IDs (say, like 2 > probe IDs), things go awry: > > *** >> exprs[c("gnf1h00499_at","gnf1h500_at"),] > ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 GSM18774 > gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 4.472488 > NA ? ? ? ? ? ? ? ? ?NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA > etc. > *** > > The gnf1h00500 probe is reported as NA, and I'm pretty sure it has > real expression values associated with it. > The following just works fine: > > *** >> exprs[c(1:20,30:70),] > ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 GSM18774 > 200000_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > 200001_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > 200002_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > 200003_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > etc. > *** > > So, how do I select rows on the basis of probe IDs? Or better yet: > what am I overlooking???? > > Thanks & kind regards, > Bas > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode
Dear Sebastian: Thanks for your swift reply. It works, but only for the probe ID that start with a character (only ~15 out of the > 100 probe IDs I want to investigate). Those that start with a number report back with "<0 rows> (or 0-length row.names)". The motto for the New Year seems to be 'Solve a problem, only to find new ones'. Phew. Kind regards, Bas On Tue, Jan 3, 2012 at 11:19 AM, Sebastian Thieme <thieme at="" mi.fu-berlin.de=""> wrote: > Hello, > > happy new year too =) > > you can use exprs[ rownames(exprs) %in% "gnf1h00499_at",] or exprs[ > rownames(exprs) %in% vectorOfNames,], where vectorOfNames is a list or > a vector of the names you are looking for. Important is that the > object you are search in has to be the first argument. If you want > requesting a high number of names use lists instead of dataframes. > > best > > Basti > > 2012/1/3 Bas Jansen <bjhjansen at="" gmail.com="">: >> Dear fellow Bioconductor users: >> >> Happy New Year! >> At the moment I am analyzing the GNF Atlas data. I retrieved the data >> from the Gene Expression Omnibus using the package GEOquery, converted >> it to an expressionSet and extracted the expression values. So now I >> have a data frame from which I would like to extract the expression >> values of > 100 probe IDs for 79 tissues. Thing is, if I use a single >> probe ID, things go fine. However, whenever I use a string of probe >> IDs, things go awry. >> >> See below: >> >> *** >>> exprs[c("gnf1h00499_at"),] >> ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 GSM18774 >> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 4.472488 >> (abbreviated for reasons of clarity) >> *** >> >> As stated above: whenever I use a string of probe IDs (say, like 2 >> probe IDs), things go awry: >> >> *** >>> exprs[c("gnf1h00499_at","gnf1h500_at"),] >> ? ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 GSM18774 >> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074 4.472488 >> NA ? ? ? ? ? ? ? ? ?NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA ? ? ? NA >> etc. >> *** >> >> The gnf1h00500 probe is reported as NA, and I'm pretty sure it has >> real expression values associated with it. >> The following just works fine: >> >> *** >>> exprs[c(1:20,30:70),] >> ? ? ? ? ? ?GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781 GSM18774 >> 200000_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> 200001_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> 200002_at ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> 200003_s_at ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 >> etc. >> *** >> >> So, how do I select rows on the basis of probe IDs? Or better yet: >> what am I overlooking???? >> >> Thanks & kind regards, >> Bas >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode