Search
Question: GSE states
0
5.1 years ago by
Guest User12k
Guest User12k wrote:
I am trying to work with GSE expression data from GEO in R using GEOquery. I am using the following commands gset <- getGEO("GSE52",GSEMatrix =TRUE,) ex <- exprs(gset[[1]]) But I have no idea how to separate the data in two different states. I am trying to locate states data using varLabels(gset[[1]]@phenoData), when GSEMatrix =TRUE or Meta (gse), when GSEMatrix =FALSE But I still have no clue. I need to do this for around 1000 different database, so I cannot do it using GEO2R manually... Is there a particular field I can consider state? I also would like to know I can find out if data is already log2 transformed or log10 or inits raw form using just R. -- output of sessionInfo(): none -- Sent via the guest posting facility at bioconductor.org.
modified 5.1 years ago by Tim Triche4.2k • written 5.1 years ago by Guest User12k
0
5.1 years ago by
Tim Triche4.2k
United States
Tim Triche4.2k wrote:
how do you define "states"? You might find the GEOmetadb package very useful. Read the vignette for it. It may be what you need. In terms of processing GSEs and GSMs, once you know what you are searching for, it's much easier to find it. Hope this helps. --t On Wed, Jul 24, 2013 at 10:55 AM, Lina Thomas [guest] < guest@bioconductor.org> wrote: > > I am trying to work with GSE expression data from GEO in R using GEOquery. > > I am using the following commands > gset <- getGEO("GSE52",GSEMatrix =TRUE,) > ex <- exprs(gset[[1]]) > > But I have no idea how to separate the data in two different states. I am > trying to locate states data using > > varLabels(gset[[1]]@phenoData), when GSEMatrix =TRUE > > or > > Meta (gse), when GSEMatrix =FALSE > > But I still have no clue. I need to do this for around 1000 different > database, so I cannot do it using GEO2R manually... Is there a particular > field I can consider state? > > I also would like to know I can find out if data is already log2 > transformed or log10 or inits raw form using just R. > > -- output of sessionInfo(): > > none > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
0
5.1 years ago by
Tim Triche4.2k
United States
Tim Triche4.2k wrote:
You might want to look in gds_subset instead, for instances where defined differences exist. If you're searching free-form for arbitrary differences then yes, you're pretty much screwed. But for instances where GEO2R can see the differences, GEOmetadb can see them as well. You may need to poke around a bit, though, since your description is so broad (nothing personal, it's just an incredibly broad description and you may be the only person who can narrow it). I'm cc:'ing Sean on this, since he's more familiar with the package (his package!) than I am, but here's a start... library(GEOmetadb) geometadbfile <- getSQLiteFile() ## if needed con <- dbConnect(SQLite(), geometadbfile) colDescs <- columnDescriptions() colDescs[ grep('gds_subset', colDescs[,1]), ] ## ooh, this looks promising... gds.res <- dbGetQuery(con, 'SELECT DISTINCT gds, type, description, count(sample_id) AS samples FROM gds_subset GROUP BY type, gds') gds.res ## data frame with 5498 rows and 4 columns ## gds type description samples ## <character> <character> <character> <integer> ## 1 GDS1054 age 36 wk 5 ## 2 GDS1055 age 36 wk 5 ## 3 GDS1079 age 23 mo 2 ## 4 GDS1264 age 20 m 3 ## 5 GDS1265 age 22 d 5 ## ... ... ... ... ... ## 5494 GDS910 tissue bone marrow 5 ## 5495 GDS917 tissue whole blood 2 ## 5496 GDS953 tissue non-branching region 2 ## 5497 GDS956 tissue substantia nigra pars compacta 2 ## 5498 GDS960 tissue right lung 2 It may take some further twiddling on your part but I'd be surprised if this can't be pressed into service for your needs... It could take a while to extract 5498 subsets of data from GEO2R ;-) Best, --t On Wed, Jul 24, 2013 at 11:58 AM, Lina Thomas <linadth6@gmail.com> wrote: > Hello, thanks for your answer! > > I am studying the alterations of gene expression when a biological system > suffers a change. So any change, any difference can be a state. You can > also call it a condition. > > With GDS I know where to find these information (disease.state or > infection or stress or tissue, etc), but GSE is giving me a little > headache. I am starting to think the information is hidden in the fields > title or source name (the two columns in GEO2R). But I cant think of an > automatic way to separate the data... > > I took a quick look at GEOmetadb but it does not seem that is quite what I > need. It might be helpful in the future though. Thanks anyway! > > See ya, > > Lina > > > On Wed, Jul 24, 2013 at 3:02 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > >> how do you define "states"? >> >> You might find the GEOmetadb package very useful. Read the vignette for >> it. It may be what you need. >> >> In terms of processing GSEs and GSMs, once you know what you are >> searching for, it's much easier to find it. >> >> Hope this helps. >> >> --t >> >> >> >> On Wed, Jul 24, 2013 at 10:55 AM, Lina Thomas [guest] < >> guest@bioconductor.org> wrote: >> >>> >>> I am trying to work with GSE expression data from GEO in R using >>> GEOquery. >>> >>> I am using the following commands >>> gset <- getGEO("GSE52",GSEMatrix =TRUE,) >>> ex <- exprs(gset[[1]]) >>> >>> But I have no idea how to separate the data in two different states. I >>> am trying to locate states data using >>> >>> varLabels(gset[[1]]@phenoData), when GSEMatrix =TRUE >>> >>> or >>> >>> Meta (gse), when GSEMatrix =FALSE >>> >>> But I still have no clue. I need to do this for around 1000 different >>> database, so I cannot do it using GEO2R manually... Is there a particular >>> field I can consider state? >>> >>> I also would like to know I can find out if data is already log2 >>> transformed or log10 or inits raw form using just R. >>> >>> -- output of sessionInfo(): >>> >>> none >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
Well thanks a lot! In some way that is going to be really usefull! I will try to learn some SQL code.. But actually I think I am really screwed... Because working with gds is not enough... I will nedd to work with gse also, specially the ones that are no associated with a gds.. I am working with mus musculus and I need around 1000 different expression tables. When I search for mus musculus in https://www.ncbi.nlm.nih.gov/sites/GDSbrowser/ it returns around 1300 DataSet records and I could only find around 250 that have more than 20 samples... And then I have to filter the ones that I can identify possible transformations: log2, log10, etc I know the description is broad but that is what was asked for me to do... So I am starting to belive I will have to work with data one by one in R to choose states :( On Wed, Jul 24, 2013 at 4:43 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > Incidentally (picking up where the previous example left off), one can > catalog the number of subsetted GDS records and the states by which they > were subsetted as follows (assuming I have correctly understood your > meaning for "states"): > > gds.with.subsets <- dbGetQuery(con, 'SELECT COUNT(DISTINCT gds) AS > subsetted_datasets, type AS state FROM gds_subset GROUP BY type') > > gds.with.subsets > ## data frame with 24 rows and 2 columns > ## subsetted_datasets state > ## <integer> <character> > ## 1 184 age > ## 2 866 agent > ## 3 145 cell line > ## 4 233 cell type > ## 5 136 development stage > ## ... ... ... > ## 20 231 strain > ## 21 112 stress > ## 22 10 temperature > ## 23 776 time > ## 24 330 tissue > > Again, it may take a bit of SQL to assemble the results such that you can > troll through them with GEOquery, but I suspect that the process will be > less painful than if you did this manually through GEO2R. But that's > something for you to decide. > > Best, > > --t > > > > On Wed, Jul 24, 2013 at 12:33 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > >> You might want to look in gds_subset instead, for instances where defined >> differences exist. If you're searching free-form for arbitrary differences >> then yes, you're pretty much screwed. But for instances where GEO2R can >> see the differences, GEOmetadb can see them as well. You may need to poke >> around a bit, though, since your description is so broad (nothing personal, >> it's just an incredibly broad description and you may be the only person >> who can narrow it). >> >> I'm cc:'ing Sean on this, since he's more familiar with the package (his >> package!) than I am, but here's a start... >> >> >> library(GEOmetadb) >> geometadbfile <- getSQLiteFile() ## if needed >> con <- dbConnect(SQLite(), geometadbfile) >> >> colDescs <- columnDescriptions() >> colDescs[ grep('gds_subset', colDescs[,1]), ] ## ooh, this looks >> promising... >> >> gds.res <- dbGetQuery(con, 'SELECT DISTINCT gds, type, description, >> count(sample_id) AS samples FROM gds_subset GROUP BY type, gds') >> >> gds.res >> ## data frame with 5498 rows and 4 columns >> ## gds type description samples >> ## <character> <character> <character> <integer> >> ## 1 GDS1054 age 36 wk 5 >> ## 2 GDS1055 age 36 wk 5 >> ## 3 GDS1079 age 23 mo 2 >> ## 4 GDS1264 age 20 m 3 >> ## 5 GDS1265 age 22 d 5 >> ## ... ... ... ... ... >> ## 5494 GDS910 tissue bone marrow 5 >> ## 5495 GDS917 tissue whole blood 2 >> ## 5496 GDS953 tissue non-branching region 2 >> ## 5497 GDS956 tissue substantia nigra pars compacta 2 >> ## 5498 GDS960 tissue right lung 2 >> >> >> It may take some further twiddling on your part but I'd be surprised if >> this can't be pressed into service for your needs... It could take a while >> to extract 5498 subsets of data from GEO2R ;-) >> >> Best, >> >> >> --t >> >> >> >> On Wed, Jul 24, 2013 at 11:58 AM, Lina Thomas <linadth6@gmail.com> wrote: >> >>> Hello, thanks for your answer! >>> >>> I am studying the alterations of gene expression when a biological >>> system suffers a change. So any change, any difference can be a state. You >>> can also call it a condition. >>> >>> With GDS I know where to find these information (disease.state or >>> infection or stress or tissue, etc), but GSE is giving me a little >>> headache. I am starting to think the information is hidden in the fields >>> title or source name (the two columns in GEO2R). But I cant think of an >>> automatic way to separate the data... >>> >>> I took a quick look at GEOmetadb but it does not seem that is quite what >>> I need. It might be helpful in the future though. Thanks anyway! >>> >>> See ya, >>> >>> Lina >>> >>> >>> On Wed, Jul 24, 2013 at 3:02 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: >>> >>>> how do you define "states"? >>>> >>>> You might find the GEOmetadb package very useful. Read the vignette >>>> for it. It may be what you need. >>>> >>>> In terms of processing GSEs and GSMs, once you know what you are >>>> searching for, it's much easier to find it. >>>> >>>> Hope this helps. >>>> >>>> --t >>>> >>>> >>>> >>>> On Wed, Jul 24, 2013 at 10:55 AM, Lina Thomas [guest] < >>>> guest@bioconductor.org> wrote: >>>> >>>>> >>>>> I am trying to work with GSE expression data from GEO in R using >>>>> GEOquery. >>>>> >>>>> I am using the following commands >>>>> gset <- getGEO("GSE52",GSEMatrix =TRUE,) >>>>> ex <- exprs(gset[[1]]) >>>>> >>>>> But I have no idea how to separate the data in two different states. I >>>>> am trying to locate states data using >>>>> >>>>> varLabels(gset[[1]]@phenoData), when GSEMatrix =TRUE >>>>> >>>>> or >>>>> >>>>> Meta (gse), when GSEMatrix =FALSE >>>>> >>>>> But I still have no clue. I need to do this for around 1000 different >>>>> database, so I cannot do it using GEO2R manually... Is there a particular >>>>> field I can consider state? >>>>> >>>>> I also would like to know I can find out if data is already log2 >>>>> transformed or log10 or inits raw form using just R. >>>>> >>>>> -- output of sessionInfo(): >>>>> >>>>> none >>>>> >>>>> -- >>>>> Sent via the guest posting facility at bioconductor.org. >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>>> >>>> -- >>>> *A model is a lie that helps you see the truth.* >>>> * >>>> * >>>> Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >>>> >>> >>> >> >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > [[alternative HTML version deleted]]
Lina, you might try using insilicodb.org to facilitate the curation you need to do, then import ExpressionSets with your curated metadata from there. -Levi On Wed, Jul 24, 2013 at 4:06 PM, Lina Thomas <linadth6@gmail.com> wrote: > Well thanks a lot! In some way that is going to be really usefull! I will > try to learn some SQL code.. But actually I think I am really screwed... > > Because working with gds is not enough... I will nedd to work with gse > also, specially the ones that are no associated with a gds.. > I am working with mus musculus and I need around 1000 different expression > tables. When I search for mus musculus in > https://www.ncbi.nlm.nih.gov/sites/GDSbrowser/ > it returns around 1300 DataSet records and I could only find around 250 > that have more than 20 samples... And then I have to filter the ones that I > can identify possible transformations: log2, log10, etc > > I know the description is broad but that is what was asked for me to do... > > So I am starting to belive I will have to work with data one by one in R to > choose states :( > > > On Wed, Jul 24, 2013 at 4:43 PM, Tim Triche, Jr. <tim.triche@gmail.com> >wrote: > > > Incidentally (picking up where the previous example left off), one can > > catalog the number of subsetted GDS records and the states by which they > > were subsetted as follows (assuming I have correctly understood your > > meaning for "states"): > > > > gds.with.subsets <- dbGetQuery(con, 'SELECT COUNT(DISTINCT gds) AS > > subsetted_datasets, type AS state FROM gds_subset GROUP BY type') > > > > gds.with.subsets > > ## data frame with 24 rows and 2 columns > > ## subsetted_datasets state > > ## <integer> <character> > > ## 1 184 age > > ## 2 866 agent > > ## 3 145 cell line > > ## 4 233 cell type > > ## 5 136 development stage > > ## ... ... ... > > ## 20 231 strain > > ## 21 112 stress > > ## 22 10 temperature > > ## 23 776 time > > ## 24 330 tissue > > > > Again, it may take a bit of SQL to assemble the results such that you can > > troll through them with GEOquery, but I suspect that the process will be > > less painful than if you did this manually through GEO2R. But that's > > something for you to decide. > > > > Best, > > > > --t > > > > > > > > On Wed, Jul 24, 2013 at 12:33 PM, Tim Triche, Jr. <tim.triche@gmail.com> >wrote: > > > >> You might want to look in gds_subset instead, for instances where > defined > >> differences exist. If you're searching free-form for arbitrary > differences > >> then yes, you're pretty much screwed. But for instances where GEO2R can > >> see the differences, GEOmetadb can see them as well. You may need to > poke > >> around a bit, though, since your description is so broad (nothing > personal, > >> it's just an incredibly broad description and you may be the only person > >> who can narrow it). > >> > >> I'm cc:'ing Sean on this, since he's more familiar with the package (his > >> package!) than I am, but here's a start... > >> > >> > >> library(GEOmetadb) > >> geometadbfile <- getSQLiteFile() ## if needed > >> con <- dbConnect(SQLite(), geometadbfile) > >> > >> colDescs <- columnDescriptions() > >> colDescs[ grep('gds_subset', colDescs[,1]), ] ## ooh, this looks > >> promising... > >> > >> gds.res <- dbGetQuery(con, 'SELECT DISTINCT gds, type, description, > >> count(sample_id) AS samples FROM gds_subset GROUP BY type, gds') > >> > >> gds.res > >> ## data frame with 5498 rows and 4 columns > >> ## gds type description samples > >> ## <character> <character> <character> <integer> > >> ## 1 GDS1054 age 36 wk 5 > >> ## 2 GDS1055 age 36 wk 5 > >> ## 3 GDS1079 age 23 mo 2 > >> ## 4 GDS1264 age 20 m 3 > >> ## 5 GDS1265 age 22 d 5 > >> ## ... ... ... ... ... > >> ## 5494 GDS910 tissue bone marrow 5 > >> ## 5495 GDS917 tissue whole blood 2 > >> ## 5496 GDS953 tissue non-branching region 2 > >> ## 5497 GDS956 tissue substantia nigra pars compacta 2 > >> ## 5498 GDS960 tissue right lung 2 > >> > >> > >> It may take some further twiddling on your part but I'd be surprised if > >> this can't be pressed into service for your needs... It could take a > while > >> to extract 5498 subsets of data from GEO2R ;-) > >> > >> Best, > >> > >> > >> --t > >> > >> > >> > >> On Wed, Jul 24, 2013 at 11:58 AM, Lina Thomas <linadth6@gmail.com> > wrote: > >> > >>> Hello, thanks for your answer! > >>> > >>> I am studying the alterations of gene expression when a biological > >>> system suffers a change. So any change, any difference can be a state. > You > >>> can also call it a condition. > >>> > >>> With GDS I know where to find these information (disease.state or > >>> infection or stress or tissue, etc), but GSE is giving me a little > >>> headache. I am starting to think the information is hidden in the > fields > >>> title or source name (the two columns in GEO2R). But I cant think of > an > >>> automatic way to separate the data... > >>> > >>> I took a quick look at GEOmetadb but it does not seem that is quite > what > >>> I need. It might be helpful in the future though. Thanks anyway! > >>> > >>> See ya, > >>> > >>> Lina > >>> > >>> > >>> On Wed, Jul 24, 2013 at 3:02 PM, Tim Triche, Jr. <tim.triche@gmail.com> >wrote: > >>> > >>>> how do you define "states"? > >>>> > >>>> You might find the GEOmetadb package very useful. Read the vignette > >>>> for it. It may be what you need. > >>>> > >>>> In terms of processing GSEs and GSMs, once you know what you are > >>>> searching for, it's much easier to find it. > >>>> > >>>> Hope this helps. > >>>> > >>>> --t > >>>> > >>>> > >>>> > >>>> On Wed, Jul 24, 2013 at 10:55 AM, Lina Thomas [guest] < > >>>> guest@bioconductor.org> wrote: > >>>> > >>>>> > >>>>> I am trying to work with GSE expression data from GEO in R using > >>>>> GEOquery. > >>>>> > >>>>> I am using the following commands > >>>>> gset <- getGEO("GSE52",GSEMatrix =TRUE,) > >>>>> ex <- exprs(gset[[1]]) > >>>>> > >>>>> But I have no idea how to separate the data in two different states. > I > >>>>> am trying to locate states data using > >>>>> > >>>>> varLabels(gset[[1]]@phenoData), when GSEMatrix =TRUE > >>>>> > >>>>> or > >>>>> > >>>>> Meta (gse), when GSEMatrix =FALSE > >>>>> > >>>>> But I still have no clue. I need to do this for around 1000 different > >>>>> database, so I cannot do it using GEO2R manually... Is there a > particular > >>>>> field I can consider state? > >>>>> > >>>>> I also would like to know I can find out if data is already log2 > >>>>> transformed or log10 or inits raw form using just R. > >>>>> > >>>>> -- output of sessionInfo(): > >>>>> > >>>>> none > >>>>> > >>>>> -- > >>>>> Sent via the guest posting facility at bioconductor.org. > >>>>> > >>>>> _______________________________________________ > >>>>> Bioconductor mailing list > >>>>> Bioconductor@r-project.org > >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>>> Search the archives: > >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> *A model is a lie that helps you see the truth.* > >>>> * > >>>> * > >>>> Howard Skipper< > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > >>>> > >>> > >>> > >> > >> > >> -- > >> *A model is a lie that helps you see the truth.* > >> * > >> * > >> Howard Skipper< > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > >> > > > > > > > > -- > > *A model is a lie that helps you see the truth.* > > * > > * > > Howard Skipper< > http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]