briefly, based on a previously identified gene signature in a specific type of cancer (through gene expression analysis), in parallel i found also 4 specific microRNAs (mature miRs) that regulate a specific subset of my signature (~18 genes) via experimentally validated databases. Now, as i final step i would like to explore in the TCGA COAD dataset, the expression of the miRs and the relative expression of these genes in the same patients, to investigate any kind of significant and negative correlation, which would confirm further my notion-
from a quick search, i found that the curatedTCGAData R package contains various assays for various types of TCGA data, including the cancer of interest, and from a small query:
curatedTCGAData(diseaseCode = "*", assays = "*", dry.run = TRUE) Please see the list below for available cohorts and assays Available Cancer codes: ACC BLCA BRCA CESC CHOL COAD DLBC ESCA GBM HNSC KICH KIRC KIRP LAML LGG LIHC LUAD LUSC MESO OV PAAD PCPG PRAD READ SARC SKCM STAD TGCT THCA THYM UCEC UCS UVM Available Data Types: CNACGH CNASeq CNASNP CNVSNP GISTICA GISTICT Methylation miRNAArray miRNASeqGene mRNAArray Mutation RNASeq2GeneNorm RNASeqGene RPPAArray
A) with COAD, which data types should i select ? in order to have only the miRNA expression and the RNASeq expression data ?
i see that there are miRNAArray, miRNASeqGene, RNASeq2GeneNorm, RNASeqGene and mRNAArray-however i dont know the specific differences, as i have used data mostly from the GDC server-my notion is that both types of expression should be normalized and/or transformed into the same way, for the correlation analysis to be appropriate
B) Moreover, how i could subset both assays, based on specific miRs and specific gene symbols simultaneously ?
Any suggestions, help or idea would be essential !!