Hello, I'm trying to calculate correlations between the protein-coding gene-miRNA pairs from TCGA RNAseq data. I'd like to perform this analysis across different TCGA projects (32 different tumor types). My goal is to:
1- Find gene-miRNA pairs that are significantly correlated with each other.
2- Generate exploratory graphs showing gene expression levels in subsets of data (e.g. miRNA expression levels in CD1a-high vs low data subsets).
3- Perform survival analysis based on gene expression levels (both univariate CoxPH with continuous expression and KM analysis after categorizing the gene expression as low
and high
at the median value for instance.
My approach is not intended to find differentially expressed genes/miRNAs, instead, I would like to assess expression levels of genes and the correlative relationships.
Since the tumor types in my analysis are varied and I don't have a predetermined sample classification, I opted out of using a design matrix and used the voom
normalization method for both RNAseq
and "miRNAseq` data as follows:
dge_rna <- DGEList(counts = rna_assay, samples = rna_sample_meta)
keep_rna <- filterByExpr(dge_rna)
dge_rna <- dge_rna[keep,,keep.lib.sizes=FALSE]
dge_rna <- calcNormFactors(dge_rna)
v_rna <- voom(dge_rna, plot=F)
norm_rna <- v$E
dge_mir <- DGEList(counts = mirna_assay, samples = mirna_sample_meta)
keep_mir <- filterByExpr(dge_mir)
dge_mir <- dge_mir[keep,,keep.lib.sizes=FALSE]
dge_mir <- calcNormFactors(dge_mir)
v_mir <- voom(dge_mir, plot=F)
norm_mir <- v_mir$E
Then, using base R functions, I calculated the correlations and performed survival analyses etc. Is there anything wrong with this approach? Is cpm
or another normalization a better choice here? I read several posts on Biostars and noticed that z-score, vst
, and rlog
approaches are other alternatives.
My overall goal is just to obtain normalized data to be able to compare expression levels between subsets of data. There are other metadata (such as tumor subtype, immune status etc) I'd like to facet with the expression analysis/plotting.
Best regards,
Atakan