Hi David,
assays(rse_gene_SRP009615)$TPM
is one way you can access the TPM matrix. I think that you will benefit from reading the SummarizedExperiment
vignette number 1, that is, http://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html. In particular, section 2 which talks about the anatomy of a SummarizedExperiment
object. recount3
returns RangedSummarizedExperiment
objects and assumes familiarity with the SummarizedExperiment
as noted at http://bioconductor.org/packages/release/bioc/vignettes/recount3/inst/doc/recount3-quickstart.html#22_Required_knowledge.
As for your sidenote question, one of the colData()
columns from your RSE object will be the column with the GTEx detailed tissue region. You'll be able to use that to subset your object. That will contain SMTSD
in lowercase from looking at https://storage.googleapis.com/gtex_analysis_v8/annotations/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt. Actually, here's the code:
library("recount3")
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#> dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#> grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
rse <- recount3::create_rse_manual(
project = "BRAIN",
project_home = "data_sources/gtex",
organism = "human",
annotation = "gencode_v26",
type = "gene"
)
#> 2022-06-07 10:06:57 downloading and reading the metadata.
#> 2022-06-07 10:06:58 caching file gtex.gtex.BRAIN.MD.gz.
#> 2022-06-07 10:06:58 caching file gtex.recount_project.BRAIN.MD.gz.
#> 2022-06-07 10:06:58 caching file gtex.recount_qc.BRAIN.MD.gz.
#> 2022-06-07 10:06:59 caching file gtex.recount_seq_qc.BRAIN.MD.gz.
#> 2022-06-07 10:06:59 downloading and reading the feature information.
#> 2022-06-07 10:06:59 caching file human.gene_sums.G026.gtf.gz.
#> 2022-06-07 10:07:00 downloading and reading the counts: 2931 samples across 63856 features.
#> 2022-06-07 10:07:00 caching file gtex.gene_sums.BRAIN.G026.gz.
#> 2022-06-07 10:07:22 construcing the RangedSummarizedExperiment (rse) object.
grep("smtsd", colnames(colData(rse)))
#> [1] 15
colnames(colData(rse))[15]
#> [1] "gtex.smtsd"
table(rse$gtex.smtsd)
#>
#> Brain - Amygdala
#> 163
#> Brain - Anterior cingulate cortex (BA24)
#> 201
#> Brain - Caudate (basal ganglia)
#> 273
#> Brain - Cerebellar Hemisphere
#> 250
#> Brain - Cerebellum
#> 285
#> Brain - Cortex
#> 286
#> Brain - Frontal Cortex (BA9)
#> 224
#> Brain - Hippocampus
#> 220
#> Brain - Hypothalamus
#> 221
#> Brain - Nucleus accumbens (basal ganglia)
#> 262
#> Brain - Putamen (basal ganglia)
#> 221
#> Brain - Spinal cord (cervical c-1)
#> 171
#> Brain - Substantia nigra
#> 154
dim(rse)
#> [1] 63856 2931
dim(rse[, rse$gtex.smtsd == "Brain - Amygdala"])
#> [1] 63856 163
Created on 2022-06-07 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)
Best,
Leo
Thank you very much for the detail explanation and teaching. That's very helpful!