Hi guys,
I tried to get the colData from the GSE dataset I have downloaded with GEOquery. I have two problems to solve.
- How to get the counts table from the dataset?
- How to prepare for the colData that the DEseq2 package asks for?
Here's what I have done.
First I downloaded the dataset in the matrix form and transform into expression set to have a look at the table.
gse <- getGEO("GSE7765", GSEMatrix = TRUE)
show(gse)
#take a look at the metadata
metadata<-pData(gse[[1]])
metadata[,1:5]
colnames(metdata)
# thus I know the group division
#then I turn the sample into the expression set
eset = exprs(gse[[1]])
dim(eset)
head(eset)
The doc says that if I download the matrix form, I should get the expression set. I tried head(gse[[1]]) but failed to see the table.
Then after the trans, the table shows but it's not counts table, so I have to re-download the dataset with the soft format.
rm(list = ls())
gse <- getGEO("GSE7765",GSEMatrix = F)
show(gse)
GSMList(gse)[[1]]
Then I look at the first sample and get the column description ** Column Descriptions ** Column Description 1 ID_REF = 2 VALUE value 3 ABS_CALL present/absent 4 DETECTION P-VALUE 0.05
So I still dont get the counts table that I need for the DEseq2 package. Neither do I know how to prepare for the colData.
sessioninfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] pheatmap_1.0.12 GEOquery_2.64.0 Biobase_2.56.0
[4] BiocGenerics_0.42.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 pillar_1.7.0 compiler_4.2.0
[4] BiocManager_1.30.17 R.methodsS3_1.8.1 R.utils_2.11.0
[7] base64enc_0.1-3 tools_4.2.0 digest_0.6.29
[10] uuid_1.1-0 gtable_0.3.0 jsonlite_1.8.0
[13] evaluate_0.15 lifecycle_1.0.1 tibble_3.1.6
[16] pkgconfig_2.0.3 rlang_1.0.2 IRdisplay_1.1
[19] cli_3.3.0 DBI_1.1.2 curl_4.3.2
[22] IRkernel_1.3 fastmap_1.1.0 xml2_1.3.3
[25] repr_1.1.4 dplyr_1.0.9 generics_0.1.2
[28] vctrs_0.4.1 hms_1.1.1 grid_4.2.0
[31] tidyselect_1.1.2 glue_1.6.2 data.table_1.14.2
[34] R6_2.5.1 fansi_1.0.3 limma_3.52.0
[37] pbdZMQ_0.3-7 tidyr_1.2.0 purrr_0.3.4
[40] readr_2.1.2 tzdb_0.3.0 magrittr_2.0.3
[43] scales_1.2.0 ellipsis_0.3.2 htmltools_0.5.2
[46] assertthat_0.2.1 colorspace_2.0-3 utf8_1.2.2
[49] munsell_0.5.0 crayon_1.5.1 R.oo_1.24.0
I'm new to this. Thanks for your help, and I would appreciate it if you recommend some materials to learn about RNA seq analysis.
Thanks. I'll look up for the differences between DEseq2 and limma. I realized that limma is installed with DEseq2.