Hi,
I did an analysis through the cBioPortal website and another one using TCGAbiolinks to get the TCGA files and then DESeq2 locally. Although there is a good overlap (58%) of differentially expressed (DE) genes, I'm curious to understand why there isn't a more satisfactory overlap when comparing the two pipelines.
This is the analysis using cBioPortal: https://www.cbioportal.org/comparison/mrna?comparisonId=63b2d2551cec6922c422d9a2
I noticed the DE genes in my analysis that are not statistically DE in cBioPortal are mostly genes with low counts. I have already tried to filter these genes using:
keep <- rowSums( counts(dds) >= 5 ) >= 50 #since I am working with >950 samples
dds <- dds[keep,]
But I keep seeing very low expressed genes as the top DE genes (lower padj) in my analysis.
Some examples that could be checked in the link above; these genes are DE in my analysis, but not in cBioPortal: "ALDH3A1" "STEAP1B" "GHSR" "KRT12"
One gene, "OR3A3", for example, is up-regulated in the “High” group in cBioPortal, but downregulated in the “High” group in my analysis.
Is there a way to get a better overlaping or that is what it is?
I will be glad to provide more details.
Thank you,
Alex
I didn't see an indication of using the
cBioPortalData
R package. I've added theTCGAbiolinks
package tag instead.FWIW, you should be able to get the same data from
cBioPortalData
withstudyId = "brca_tcga_pan_can_atlas_2018"
.Best,
Marcel
Hi Marcel,
Thank you for your response.
Your suggestion works for me. I can use the data got from cBioPortalData. What do you recommend to perform a differential expression analysis on the data “data_mrna_seq_v2_rsem”? I would prefer to use DESeq2 if it is possible.
Thank you for your time.
Alex