Hello everyone
I`m working with RNA-seq gene expression data derived from multiple tissue samples collected from different patients. My primary goal is to identify differentially expressed genes (DEGs) while minimizing the confounding effects of tissue of origin on the results.
A brief overview of my approach:
I`ve converted the raw count expression data into CPM values using edgeR to normalize between samples, accounting for library size. Filtering was applied using filterByExpr to retain relevant genes. Normalization was conducted using the TMM method:
keep <- filterByExpr(counts)
counts <- counts[keep, , keep.lib.sizes=FALSE]
counts <- normLibSizes(counts, method = "TMM")
counts_cpm <- cpm(counts, log = TRUE)
To account for patient and tissue variability, I applied a mixed linear model using the variancePartition package. My formula models the contribution of both patient and tissue to the gene expression variation:
form <- ~ (1 | Tissue) + (1 | Patient)
vp_modelFit <- fitVarPartModel(counts_cpm, form, df)
vp_modelFit_res <- residuals(vp_modelFit)
My understanding is that the residuals from this model should, in theory, represent gene expression values devoid of tissue- and patient-specific effects, potentially revealing the intrinsic cancer-related signals.
Question:
- Is this approach statistically sound for achieving my aim? Specifically, does this methodology appropriately remove the unwanted variation from tissue and patient sources while preserving biologically relevant signals?
- Any recommendations for improving the robustness of this approach, especially in terms of ensuring that intrinsic cancer-related signals are not inadvertently removed?
Thank you!
I don't follow what problem you are trying to solve. Your goal is to identify DE genes, but DE between what?
Hi Gordon, Thanks for your response. I will investigate DEGs between genetic subclones, that is subclasses with particulate genetic mutations,
after
I regressed out the potential impact of different tissues and individuals from the expression data. I did not want to make the question more complex.Cross-posted to Biostars: https://www.biostars.org/p/9601556/