Here is the basic example for downloading rna-seq data for gbm from TCGA using TCGABiolinks.
# mRNA pipeline: https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/ query.exp.hg38 <- GDCquery(project = "TCGA-GBM", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - FPKM-UQ", barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01")) GDCdownload(query.exp.hg38) expdat <- GDCprepare(query = query.exp.hg38, save = TRUE, save.filename = "exp.rda")
My question is: is there a way to filter for what genes we want to include in the downloaded summarized experiment object? What if I just want to look at, for example, expression of ribosomal proteins and don't want to download the thousands of other genes? Am thinking of using TCGABiolinks in a shiny app where the user can select the genes they want, so not downloading everything will be a lot faster.
If this is not possible with TCGAbiolinks is there another package that has the functionality I'm looking for?