Download TCGA dataset like a S4 object
1
0
Entering edit mode
Giuseppe ▴ 20
@giuseppe-16310
Last seen 6.1 years ago

Hi all,

I'm making a function that uses input TCGA datasets. I know that for the purpose of reproducibility, inputs need to be BioC objects, and not text files.

My question is: is there any package that allows me download TCGA datasets like a BioC objects (S4) ?

Thank you so much!

dataset tcga • 2.0k views
ADD COMMENT
0
Entering edit mode
@mariozanfardino-15232
Last seen 4.3 years ago
Naples (Italy)

Hi Giuseppe, try curatedTCGA package:

https://bioconductor.org/packages/release/data/experiment/html/curatedTCGAData.html

"This package provides publicly available data from The Cancer Genome Atlas (TCGA) Bioconductor MultiAssayExperiment class objects. These objects integrate multiple assays (e.g. RNA-seq, copy number, mutation, microRNA, protein, and others) with clinical / pathological data. The MultiAssayExperiment class links assay barcodes with patient IDs, enabling harmonized subsetting of rows (features) and columns (patients / samples) across the entire experiment."

 

ADD COMMENT
1
Entering edit mode

This run in my env:

SKCM <- curatedTCGAData(diseaseCode = "SKCM", assays = "Methylation", dry.run = FALSE)

Obviously, the data are in MultiAssayExperiment format (a Bioconductor object-oriented S4 class)(https://bioconductor.org/packages/release/bioc/vignettes/MultiAssayExperiment/inst/doc/MultiAssayExperiment.html).

 

ADD REPLY
1
Entering edit mode

Thank you very much for your support! I have last question.. I executed this instruction:

> experiments(SKCM)

ExperimentList class object of length 1:

[1] SKCM_Methylation-20160128: SummarizedExperiment with 485577 rows and 475 columns

I want to convert ExperimentList class to a data frame. How can I do it?

ADD REPLY
2
Entering edit mode

Probably you don't want to convert this to a data frame, but learn to use the SummarizedExperiment class, see for instance here and the package vignette here.

ADD REPLY
1
Entering edit mode
df <- as.data.frame(wideFormat(SKCM[1:10,1:10 ,"SKCM_Methylation-20160128"], colDataCols = c(1:10)))

df subsetted for 10 Patient, 10 features and 10 colData

ADD REPLY
0
Entering edit mode

thank you so much! The Only problem is that download Methylation dataset is very slow! And the S4 object occupies so much memory

ADD REPLY

Login before adding your answer.

Traffic: 570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6