Question

Perform correlation analysis between miRNA and mRNA gene expression data on the same TCGA dataset based on the curatedTCGAData

2

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 5 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Community,

briefly, based on a previously identified gene signature in a specific type of cancer (through gene expression analysis), in parallel i found also 4 specific microRNAs (mature miRs) that regulate a specific subset of my signature (~18 genes) via experimentally validated databases. Now, as i final step i would like to explore in the TCGA COAD dataset, the expression of the miRs and the relative expression of these genes in the same patients, to investigate any kind of significant and negative correlation, which would confirm further my notion-

from a quick search, i found that the curatedTCGAData R package contains various assays for various types of TCGA data, including the cancer of interest, and from a small query:

curatedTCGAData(diseaseCode = "*", assays = "*", dry.run = TRUE)

Please see the list below for available cohorts and assays
Available Cancer codes:
 ACC BLCA BRCA CESC CHOL COAD DLBC ESCA GBM HNSC KICH
 KIRC KIRP LAML LGG LIHC LUAD LUSC MESO OV PAAD PCPG
 PRAD READ SARC SKCM STAD TGCT THCA THYM UCEC UCS UVM 
Available Data Types:
 CNACGH CNASeq CNASNP CNVSNP GISTICA GISTICT
 Methylation miRNAArray miRNASeqGene mRNAArray
 Mutation RNASeq2GeneNorm RNASeqGene RPPAArray

Thus:

A) with COAD, which data types should i select ? in order to have only the miRNA expression and the RNASeq expression data ?

i see that there are miRNAArray, miRNASeqGene, RNASeq2GeneNorm, RNASeqGene and mRNAArray-however i dont know the specific differences, as i have used data mostly from the GDC server-my notion is that both types of expression should be normalized and/or transformed into the same way, for the correlation analysis to be appropriate

B) Moreover, how i could subset both assays, based on specific miRs and specific gene symbols simultaneously ?

Any suggestions, help or idea would be essential !!

curatedTCGAData MultiAssayExperiment multiomics TCGA • 2.7k views

ADD COMMENT • link updated 6.7 years ago by Levi Waldron ★ 1.1k • written 6.7 years ago by svlachavas ▴ 840

score 2 · Answer 1 · 2019-04-05

A) the drill-down process in curatedTCGAData goes something like this. The data are the last snapshot provided by TCGA Firehose, ie GDC "legacy" data ( https://confluence.broadinstitute.org/display/GDAC/FAQ ).

> library(curatedTCGAData)
> curatedTCGAData(diseaseCode = "*", assays = "*", dry.run = TRUE)
Please see the list below for available cohorts and assays
Available Cancer codes:
 ACC BLCA BRCA CESC CHOL COAD DLBC ESCA GBM HNSC KICH
 KIRC KIRP LAML LGG LIHC LUAD LUSC MESO OV PAAD PCPG
 PRAD READ SARC SKCM STAD TGCT THCA THYM UCEC UCS UVM 
Available Data Types:
 CNACGH CNASeq CNASNP CNVSNP GISTICA GISTICT
 Methylation miRNAArray miRNASeqGene mRNAArray
 Mutation RNASeq2GeneNorm RNASeqGene RPPAArray 
> curatedTCGAData(diseaseCode = "COAD", assays = "*", dry.run = TRUE)
                                 COAD_CNASeq                                  COAD_CNASNP 
                  "COAD_CNASeq-20160128.rda"                   "COAD_CNASNP-20160128.rda" 
                                 COAD_CNVSNP                        COAD_GISTIC_AllByGene 
                  "COAD_CNVSNP-20160128.rda"         "COAD_GISTIC_AllByGene-20160128.rda" 
               COAD_GISTIC_ThresholdedByGene                            COAD_Methylation1 
"COAD_GISTIC_ThresholdedByGene-20160128.rda"     "COAD_Methylation_methyl27-20160128.rda" 
                           COAD_Methylation2                            COAD_miRNASeqGene 
   "COAD_Methylation_methyl450-20160128.rda"             "COAD_miRNASeqGene-20160128.rda" 
                              COAD_mRNAArray                                COAD_Mutation 
               "COAD_mRNAArray-20160128.rda"                 "COAD_Mutation-20160128.rda" 
                        COAD_RNASeq2GeneNorm                              COAD_RNASeqGene 
         "COAD_RNASeq2GeneNorm-20160128.rda"               "COAD_RNASeqGene-20160128.rda" 
                              COAD_RPPAArray 
               "COAD_RPPAArray-20160128.rda" 
> curatedTCGAData(diseaseCode = "COAD", assays = c("miRNASeqGene", "RNASeq2GeneNorm"), dry.run = TRUE)
                  COAD_miRNASeqGene                COAD_RNASeq2GeneNorm 
   "COAD_miRNASeqGene-20160128.rda" "COAD_RNASeq2GeneNorm-20160128.rda" 
> mae <- curatedTCGAData(diseaseCode = "COAD", assays = c("miRNASeqGene", "RNASeq2GeneNorm"), dry.run = FALSE)
>

B) This provides a MultiAssayExperiment object, which you can subset by rownames to select genes and miRNA of interest. The MultiAssayExperiment package has a cheat sheet to help with quick reference for such operations. For example:

> mae
A MultiAssayExperiment object of 2 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 2: 
 [1] COAD_miRNASeqGene-20160128: SummarizedExperiment with 705 rows and 221 columns 
 [2] COAD_RNASeq2GeneNorm-20160128: SummarizedExperiment with 20501 rows and 191 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices
> rownames(mae)
CharacterList of length 2
[["COAD_miRNASeqGene-20160128"]] hsa-let-7a-1 hsa-let-7a-2 hsa-let-7a-3 hsa-let-7b ... hsa-mir-98 hsa-mir-99a hsa-mir-99b
[["COAD_RNASeq2GeneNorm-20160128"]] A1BG A1CF A2BP1 A2LD1 A2ML1 A2M A4GALT ... ZYG11A ZYG11B ZYX ZZEF1 ZZZ3 psiTPTE22 tAKR
> rownames(mae[c("hsa-let-7a-1", "A1BG"), , ])
CharacterList of length 2
[["COAD_miRNASeqGene-20160128"]] hsa-let-7a-1
[["COAD_RNASeq2GeneNorm-20160128"]] A1BG
>

Note that the TCGAUtils package provides a number of other helper functions for MultiAssayExperiment objects coming from curatedTCGAData, for example, adding ranges so that you can subset by GRanges objects instead of by symbols.