Question: Filtering Gene Symbols and Sample Description from GSE files of GEO
1
gravatar for mahm
14 months ago by
mahm20
mahm20 wrote:

 I'm following a tutorial that shows how to parse from GSE files. But what I could get is the probe ids and the expression values for each sample. For instance, from GDS one could use Table(gds)/Column(gds) to filter the gene synmbols and the sample description. For Gsymbolss mentioned that object class is not available. Could someone help me in filtering the gene symbols and sample description from the expression set that is created using GSE data?

I am trying to get the output in the following format

 

                  Description    CKB

GSM762810                 uncultured Islets  10.3963

GSM762811                 uncultured Islets  27.6353

GSM762812                 uncultured Islets 113.6600

GSM762813                 uncultured Islets 135.0460

GSM762814 expanded Islet - dedifferentiated  90.7472

GSM762816 expanded Islet - dedifferentiated  77.8949

GSM762817 expanded Islet - dedifferentiated  68.0191

GSM762819 expanded Islet - dedifferentiated  61.7838

GSM762815 expanded Islet - redifferentiated  52.6215

GSM762818 expanded Islet - redifferentiated  63.4706

GSM762820 expanded Islet - redifferentiated  51.5406

The above is obtained using Columns(GDS) and Table(GDS)

Using the expression set for gse gives the following,

          GSM239824 GSM239825 GSM239826 GSM239827 GSM239828
1007_s_at     1.272     1.251     1.231     0.998     0.996
1053_at       1.334     1.246     1.351     0.814     0.855
117_at        0.987     1.019     0.928     0.485     0.446
121_at        0.816     0.666     0.733     0.543     0.507
1255_g_at    40.630    34.820    36.800     6.885     5.392
1294_at       0.666     0.491     0.655     0.390     0.425
1316_at       1.000     0.767     0.893     0.643     0.787
1320_at       0.945     0.861     0.981     0.864     1.019
1405_i_at     0.659     0.715     0.615     0.513     0.519
1431_at       0.655     0.667     0.656     1.000     0.990

I would like to ask for suggestions on how to parse the gene names and the sample descriptions from gse/ eset. Is there any syntax?

Any help will be highly appreciated

ADD COMMENTlink modified 14 months ago by Sean Davis21k • written 14 months ago by mahm20
Answer: Filtering Gene Symbols and Sample Description from GSE files of GEO
1
gravatar for Sean Davis
14 months ago by
Sean Davis21k
United States
Sean Davis21k wrote:

The sample information for an ExpressionSet is in the pData(eset). The gene information is in fData(eset). You can also convert the ExpressionSet to a SummarizedExperiment using as(eset, "SummarizedExperiment") if you are more comfortable using SummarizedExperiments. 

ADD COMMENTlink written 14 months ago by Sean Davis21k

Hi Sean,

I don't get the right outputs using these syntaxes, 

> pData(eset2)
            samples
GSM239824 GSM239824
GSM239825 GSM239825
GSM239826 GSM239826
GSM239827 GSM239827
GSM239828 GSM239828
GSM362248 GSM362248
GSM362249 GSM362249

as(eset2, "SummarizedExperiment") returns
Error in as(eset2, "SummarizedExperiment") : 
  no method or default for coercing “ExpressionSet” to “SummarizedExperiment”

> fData(eset2) returns
data frame with 0 columns and 54675 rows

 

eset2 is the expression set that has been created for GSE data using the instructions given here

 

Could you please suggest if there is any mistake in the syntax?

 

 

 

ADD REPLYlink modified 14 months ago • written 14 months ago by mahm20

I'm not sure what code was used, but this works for me:

library(GEOquery) 
eset = getGEO('GSE9440')[[1]] 
pData(eset) #large data frame 
fData(eset) #large data frame 
library(SummarizedExperiment) # need to load this first 
se = as(eset, "SummarizedExperiment") 
se

And the output:

class: SummarizedExperiment 
dim: 54675 8 
metadata(3): experimentData annotation protocolData
assays(1): exprs
rownames(54675): 1007_s_at 1053_at ... AFFX-TrpnX-5_at AFFX-TrpnX-M_at
rowData names(16): ID GB_ACC ... Gene.Ontology.Cellular.Component Gene.Ontology.Molecular.Function
colnames(8): GSM239824 GSM239825 ... GSM239830 GSM239831
colData names(43): title geo_accession ... Passage.ch1 sample.type.ch1
ADD REPLYlink modified 14 months ago • written 14 months ago by Sean Davis21k

Many thanks for the prompt response. The other two commands work fine now. However, the output of print(fData(eset)) is data frame with 0 columns and 54675 rows for GSE9440.

Any suggestions?

 

ADD REPLYlink written 14 months ago by mahm20

What is the output of sessionInfo() when GEOquery is loaded? And what is the output of just typing `eset`?

ADD REPLYlink written 14 months ago by Sean Davis21k

Please find the output of the sessionInfo() here

Typing eset gives,

ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 8 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM239824 GSM239825 ... GSM239831 (8 total)
  varLabels: title geo_accession ... sample type:ch1 (43 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: GPL570

Apologies for the delay in my response

ADD REPLYlink modified 14 months ago • written 14 months ago by mahm20

Sorry, one more favor. Could you post the code you used to get `eset`? I'm assuming that it is a one-liner, but just double-checking. 

ADD REPLYlink written 14 months ago by Sean Davis21k

Hi Sean, I used the syntax from your post 

eset = getGEO('GSE9440')[[1]] 

 

ADD REPLYlink written 14 months ago by mahm20

I added a few libraries and the command works now. Thanks a lot! Sorry for multiple posts

ADD REPLYlink written 14 months ago by mahm20
1

Great to hear that things are working for you. 

ADD REPLYlink written 14 months ago by Sean Davis21k

Sean, there is a problem again :(

I have created an eset of GSE15543 using,

 eset2 = getGEO('GSE15543')[[1]]

When I try

 fData(eset2)[0:10,1] 

The output is 

[1] "1007_s_at" "1053_at"   "117_at"    "121_at"    "1255_g_at" "1294_at"  
 [7] "1316_at"   "1320_at"   "1405_i_at" "1431_at"

But for 

fData(eset2)[15000:15020,1] 

The output is 

[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

I'm not sure what causes the error ,the output of  

nrow(fData(eset2))
[1] 54675

Shouldn't I get the output with the probe ids of ?

fData(eset2)[15000:15020,1] 

 

ADD REPLYlink written 14 months ago by mahm20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 287 users visited in the last hour