I want to analyze the expression of three specific genes in six TCGA cohorts. First, I want to know if the three genes are differentially expressed between tumor and normal samples in each individual TCGA cohort. Second, I want to know if the three genes are correlated in the tumor samples of each individual TCGA cohort.
My starting point for each TCGA cohort is a count matrix with counts for all genes for all samples and a corresponding colData containing information about the TCGA barcode, the sampleType (normal or tumor) and the batchID, which is important for removal of batch effects.
Here is my workflow for the differential expression analysis. As an example I picked the BRCA cohort.
register(MulticoreParam(4)) ddsBRCA <- DESeqDataSetFromMatrix(countData = countsBRCA, colData = colDataBRCA, design = ~ batchID + sampleType) #remove genes with low counts ddsBRCA <- ddsBRCA[rowMedians(counts(ddsBRCA)) >= 10,] ddsBRCA <- DESeq(ddsBRCA, parallel=TRUE) #for a very small number of genes there was no convergence ddsBRCA <- ddsBRCA[which(mcols(ddsBRCA)$betaConv),] resShrunkenBRCA <- lfcShrink(ddsBRCA, coef="sampleType_tumor_vs_normal", parallel=TRUE)
I did this for all six cohorts and now can check the regulation of my three genes of interest and probably have a look on the other differentially expressed genes.
For the correlation analysis I have the following workflow, which follows directly after the differential expression workflow.
# no correction for batch effect vsdBRCA <- vst(ddsBRCA, blind=FALSE) # correction for batch effect vsdBcBRCA <- vsdBRCA matBRCA <- assay(vsdBcBRCA) mmBRCA <- model.matrix(~sampleType, colData(vsdBcBRCA)) matBRCA <- limma::removeBatchEffect(matBRCA, batch=vsdBcBRCA$batchID, design=mmBRCA) assay(vsdBcBRCA) <- matBRCA
I made it once with and without batch correction to see the effect of the batch correction. For the COAD cohort I obesrved a strong batch effect, which was nicely corrected with the batch correction.
The correlation alaysis should be based on the vsdBcBRCA object.
My questions is whether the differential expression workflow seems valid and if the correlation analysis can be performed on the combination of vst() and removeBatchEffect().
Thanks in Advance.