Importing RSEM processed data already formatted as a summarized experiment into DESeq2
Entering edit mode
Ndimensional ▴ 20
Last seen 4 months ago
United States


I have a fairly simple question that I know has been addressed many times: I want to import RSEM data into DESeq2 for modeling and DE. For reproducibility, the workflow is:

Unfortunately, it is costing me an inordinate amount of time, and I cannot find a perfectly analogous example either in any vignette (for DESeq2, tximport, tximeta, tximportData, etc) or in other posts.

Load libraries

libsNeeded<-c( "cBioPortalData", "DESeq2", "Tximport", "SummarizedExperiment")
BiocManager::install(libsNeeded, update = TRUE, ask = FALSE)
lapply(libsNeeded, library, character.only = TRUE)

Get a MultiAssayExperiment using cBioPortal

cbio <- cBioPortal()
getStudies(api = cbio)
studies <- getStudies(cbio, buildReport = TRUE)
CptacMae<-cBioDataPack("brain_cptac_2020", ask=FALSE)

Look at resulting MAE object

CptacMae # Returns, a MultiAssayExperiment with 6 listed experiments
CptacRnaSeqSE # returns a Summarized Experiment, and:
assays(CptacMae)$mrna_seq_v2_rsem # returns a DataFrame of 18,000 rows and 188 cols. 
row.names(CptacRnaSeqSE) # returns HUGO gene symbols - so this is **not** transcript level info
all(row.names(colData(CptacRnaSeqSE) ) == colnames(assays(CptacRnaSeqSE)[[1]])) # TRUE

Determine prior processing of Rna-Seq data by reading the metadata from cBioPortal:

cancer_study_identifier: brain_cptac_2020

stable_id: rna_seq_v2_mrna

profile_name: RNA expression

profile_description: Log2 of FPKM + 1 expression values. STAR v2.6.1d aligned + RSEM v1.3.1 quantification.

genetic_alteration_type: MRNA_EXPRESSION

datatype: CONTINUOUS

show_profile_in_analysis_tab: false

data_filename: data_mrna_seq_v2_rsem.txt

Of course, I cannot load these data directly with DESeqDataSetFromMatrix() because the values are decimal and it wouldn't be appropriate.

Using RSEM with tximport and DESeq2 provides a workflow using:

se <- tximeta(ColData, type="rsem", txIn=FALSE, txOut=FALSE, skipMeta=TRUE)
assays(se)$length[ assays(se)$length == 0] <- NA # set these as missing

but trying this on my data returns:

tximeta(CptacRnaSeqSE, type = "rsem", txIn = FALSE, txOut = FALSE, skipMeta = TRUE)

all(c("files", "names") %in% names(coldata)) is not TRUE

I do not have files, I am working from a SE directly. This would seem like it should be the easiest scenario on paper - it's already in the native language in an appropriate data object - but its proving to be more difficult than just downloading flat files would have been ...

What am I missing here? Thank you!

tximport tximportData DESeq2 • 814 views
Entering edit mode

Update: changes made by dev team for cBioPortalData package just a few days ago detailed below. This will likely resolve the issues mentioned by those who have responded to the post:

I've incorporated information from SAMPLE_ID from the datasets to map and build SummarizedExperiment objects. Now, you should get an object that looks like the following:

mae <- cBioDataPack("brain_cptac_2020")
A MultiAssayExperiment object of 7 listed
 experiments with user-defined names and respective classes.
 Containing an ExperimentList class object of length 7:
 [1] cna: SummarizedExperiment with 19380 rows and 190 columns
 [2] linear_cna: SummarizedExperiment with 19380 rows and 190 columns
 [3] mrna_seq_v2_rsem_zscores_ref_all_samples: SummarizedExperiment with 18209 rows and 188 columns
 [4] mrna_seq_v2_rsem: SummarizedExperiment with 18209 rows and 188 columns
 [5] mutations: RaggedExperiment with 9951 rows and 200 columns
 [6] protein_quantification_zscores: SummarizedExperiment with 6429 rows and 218 columns
 [7] protein_quantification: SummarizedExperiment with 6429 rows and 218 columns
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save data to flat files

These changes are in the latest version of cBioPortalData in Bioc-devel (package version 2.13.4).

Entering edit mode
ATpoint ★ 3.6k
Last seen 13 hours ago

profile_description: Log2 of FPKM + 1 expression values. STAR v2.6.1d aligned + RSEM v1.3.1 quantification.

If the data is indeed log2(fpkm) then it is not suited for DESeq2 anyway. Note that the code does not run as such, there is no package called cBioPortal, tximport must be lowercase "t" and CptacMae<-cBioDataPack("brain_cptac_2020", ask=FALSE) throws an error for me that this studyId does not exist. Anyway, tximport requires the rsem output, you cannot run this with the data you have, but if it was raw gene level estimates you could round to nearest integer, this was suggested here on the support site on multiple occasions.

Entering edit mode

thanks I emended biocportal to biocportal data.

Entering edit mode

that data object does indeed exist both on the biocPortal website and in the studies object i mention above, row 30.

Entering edit mode

thx - if a mod moves this to an answer Ill accept it. dont know how i missed this probably just too used to looking at non log transformed RSEM data to have noticed


Entering edit mode
Last seen 19 hours ago
United States

A DESeqDataSet is a SummarizedExperiment with a few extra slots. You should be able to do something like

se <- as(se, "DESeqDataSet")
design(se) <- <put correct design here>

Maybe you need to do other things, but your code is non-functional and I can't find the data you are using, so cannot tell you any more.

Entering edit mode

the code runs as is on my machine and on a fresh instance of R v4.3 on a remote machine - are you using an older version of R?

Entering edit mode

Yes, of course I need to do additional things, as the post itself clearly implies. Thank you for your help.

Entering edit mode

Login before adding your answer.

Traffic: 651 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6