Hi all,
I have just started exploring the possibilities of TCGAbiolinks and have a question about the numbers given for the sample features "subtype_methylation.Clusters" , "subtype_miRNA.Clusters" and "subtype_CN.Clusters".
These are retrieved together with other clinical and subtype information on the samples by accessing the colData of the prepared gene expression data using TCGAbiolinks and SummarizedExperiment. The subtype_methylation.Clusters etc. give a number to each sample (e.g 1-6), but I don't know where these numbers stand for.. I am not working with the miRNA/methylation/CN data, but it would be nice to show that the expression of my genes of interests correlates to certain profiles of CN aberrations etc., and this may be an easy way to do so.
Does anyone know if these numbers correspond to some predefined set of methylation/miRNA/CN gene clusters, to which the samples were elsewhere classified based on their methylation/miRNA/CN data, and if the corresponding genes can be retrieved somewhere?
some extra info:
query<-GDCquery(project="TCGA-BRCA",
legacy=TRUE,
data.category="Gene expression",
data.type="Gene expression quantification",
platform="Illumina HiSeq",
file.type="normalized_results",
experimental.strategy="RNA-seq" ,
sample.type="Primary solid Tumor")
GDCdownload(query)
data<-GDCprepare(query)
sample.information<-colData(data)
>> the columns "subtype_methylation.Clusters" , "subtype_miRNA.Clusters" and "subtype_CN.Clusters" are given within this sample.information dataframe