Hi guys,
I tried to get the colData from the GSE dataset I have downloaded with GEOquery. I have two problems to solve.
- How to get the counts table from the dataset?
- How to prepare for the colData that the DEseq2 package asks for?
Here's what I have done.
First I downloaded the dataset in the matrix form and transform into expression set to have a look at the table.
gse <- getGEO("GSE7765", GSEMatrix = TRUE) 
show(gse)
#take a look at the metadata
metadata<-pData(gse[[1]])
metadata[,1:5]
colnames(metdata)
# thus I know the group division
#then I turn the sample into the expression set
eset = exprs(gse[[1]])
dim(eset)
head(eset)
The doc says that if I download the matrix form, I should get the expression set. I tried head(gse[[1]]) but failed to see the table.
Then after the trans, the table shows but it's not counts table, so I have to re-download the dataset with the soft format.
rm(list = ls())
gse <- getGEO("GSE7765",GSEMatrix = F) 
show(gse)
GSMList(gse)[[1]]
Then I look at the first sample and get the column description ** Column Descriptions ** Column Description 1 ID_REF = 2 VALUE value 3 ABS_CALL present/absent 4 DETECTION P-VALUE 0.05
So I still dont get the counts table that I need for the DEseq2 package. Neither do I know how to prepare for the colData.
sessioninfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] pheatmap_1.0.12     GEOquery_2.64.0     Biobase_2.56.0     
[4] BiocGenerics_0.42.0
loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-3  pillar_1.7.0        compiler_4.2.0     
 [4] BiocManager_1.30.17 R.methodsS3_1.8.1   R.utils_2.11.0     
 [7] base64enc_0.1-3     tools_4.2.0         digest_0.6.29      
[10] uuid_1.1-0          gtable_0.3.0        jsonlite_1.8.0     
[13] evaluate_0.15       lifecycle_1.0.1     tibble_3.1.6       
[16] pkgconfig_2.0.3     rlang_1.0.2         IRdisplay_1.1      
[19] cli_3.3.0           DBI_1.1.2           curl_4.3.2         
[22] IRkernel_1.3        fastmap_1.1.0       xml2_1.3.3         
[25] repr_1.1.4          dplyr_1.0.9         generics_0.1.2     
[28] vctrs_0.4.1         hms_1.1.1           grid_4.2.0         
[31] tidyselect_1.1.2    glue_1.6.2          data.table_1.14.2  
[34] R6_2.5.1            fansi_1.0.3         limma_3.52.0       
[37] pbdZMQ_0.3-7        tidyr_1.2.0         purrr_0.3.4        
[40] readr_2.1.2         tzdb_0.3.0          magrittr_2.0.3     
[43] scales_1.2.0        ellipsis_0.3.2      htmltools_0.5.2    
[46] assertthat_0.2.1    colorspace_2.0-3    utf8_1.2.2         
[49] munsell_0.5.0       crayon_1.5.1        R.oo_1.24.0
I'm new to this. Thanks for your help, and I would appreciate it if you recommend some materials to learn about RNA seq analysis.

Thanks. I'll look up for the differences between DEseq2 and limma. I realized that limma is installed with DEseq2.