Dear Community,

based on a recent machine learning approach in cancer transcriptomic data, my basic goal is to utilize mainly various mutliomics experiments concerning a TCGA dataset (for example COAD and expand to other cancers), for 2 main analyses, based on the R packages **curatedCRCData and ****MultiAssayExperiment, **which include various types of omics data for each of TCGA cancers:

A) for a selected signature of 12 genes, to perform an analysis like a Multivariate Cox regression including RNA-seq, copy number, and pathology, and test for any one of these genes for any significant results-like the EZN2 gene as desribed following tutorial:

B) The second-also very important part-is to perform a correlation analysis for RNA-Seq data with copy number variation data, again for this subset of genes-and rank, these genes which indeed show the highest and most important correlation.

Thus:

1) For initial investigation of the COAD dataset with the R package MultiAssayExperiment, i only found this link:

**https://docs.google.com/spreadsheets/d/1Ih64DDS5mqDlYFzDyCY9HAUnxvI1b6hapKP_akFuNPY/edit#gid=0**

Thus, i could download the Colon dataset from there ?

Or there are also any recent updates ?

2) Assuming my above approach is valid, and based on the link above, i should proceed with the following code ? :

library(MultiAssayExperiment) library(RaggedExperiment) library(SummarizedExperiment) accCOAD <- readRDS("coadMAEO.rds") accCOAD <- updateObject(accCOAD)

3) My final and perhaps most crusial question:

If my notion is correct-the data which are included in the above links and repositories, for the datasets, are from the firehose repository, right ?

And so, these are the legendary hg19 data from the original publications, without any updates in the survival rates or the protocols, right ?

My reason for asking, is also for my current project, i have already performed an analysis on the hg38 provisional TCGA data in the COAD dataset, but only on the transcriptomics layer, for my aformentioned signature. Thus, the posibility of utilizing the MultiAssayExperimement to interrogate at the same time different omic layers, will be a great asset for my purpose-

however, how i should interpret the difference in the genome ? i mean, if i perform a multivariate cox analysis for survival in the hg19 finding any genes significant-which have already show a survival significance in hg38-will strengthen more my results, and wiil illustrate that are robust, regardless the different protocols/technology used ?

Thank you very much for your time and consideration on this matter, and i wait for your very crusial comments or suggestions !!

Kind Regards,

Efstathios-Iason

