Question

RNAseq/miRNAseq normalization for calculating gene correlations, survival analysis, and exploratory graphing

0

Entering edit mode

atakanekiz ▴ 30

@atakanekiz-15874

Last seen 9 months ago

Turkey

Hello, I'm trying to calculate correlations between the protein-coding gene-miRNA pairs from TCGA RNAseq data. I'd like to perform this analysis across different TCGA projects (32 different tumor types). My goal is to:

1- Find gene-miRNA pairs that are significantly correlated with each other.

2- Generate exploratory graphs showing gene expression levels in subsets of data (e.g. miRNA expression levels in CD1a-high vs low data subsets).

3- Perform survival analysis based on gene expression levels (both univariate CoxPH with continuous expression and KM analysis after categorizing the gene expression as low and high at the median value for instance.

My approach is not intended to find differentially expressed genes/miRNAs, instead, I would like to assess expression levels of genes and the correlative relationships.

Since the tumor types in my analysis are varied and I don't have a predetermined sample classification, I opted out of using a design matrix and used the voom normalization method for both RNAseq and "miRNAseq` data as follows:

dge_rna <- DGEList(counts = rna_assay, samples = rna_sample_meta)  
keep_rna <- filterByExpr(dge_rna)  
dge_rna <- dge_rna[keep,,keep.lib.sizes=FALSE]  
dge_rna <- calcNormFactors(dge_rna)  
v_rna <- voom(dge_rna, plot=F)
norm_rna <- v$E

dge_mir <- DGEList(counts = mirna_assay, samples = mirna_sample_meta)  
keep_mir <- filterByExpr(dge_mir)  
dge_mir <- dge_mir[keep,,keep.lib.sizes=FALSE]  
dge_mir <- calcNormFactors(dge_mir)  
v_mir <- voom(dge_mir, plot=F)
norm_mir <- v_mir$E

Then, using base R functions, I calculated the correlations and performed survival analyses etc. Is there anything wrong with this approach? Is cpm or another normalization a better choice here? I read several posts on Biostars and noticed that z-score, vst, and rlog approaches are other alternatives.

My overall goal is just to obtain normalized data to be able to compare expression levels between subsets of data. There are other metadata (such as tumor subtype, immune status etc) I'd like to facet with the expression analysis/plotting.

Best regards,

Atakan

limma edger voom rnaseq mirnaseq • 1.0k views

ADD COMMENT • link 5.0 years ago atakanekiz ▴ 30