Some discrepancy between ED genes in cBioPortal vs DESeq2
1
0
Entering edit mode
Alexandre • 0
@04bd68e3
Last seen 15 days ago
Brazil

Hi,

I did an analysis through the cBioPortal website and another one using TCGAbiolinks to get the TCGA files and then DESeq2 locally. Although there is a good overlap (58%) of differentially expressed (DE) genes, I'm curious to understand why there isn't a more satisfactory overlap when comparing the two pipelines.

This is the analysis using cBioPortal: https://www.cbioportal.org/comparison/mrna?comparisonId=63b2d2551cec6922c422d9a2

I noticed the DE genes in my analysis that are not statistically DE in cBioPortal are mostly genes with low counts. I have already tried to filter these genes using:

keep <- rowSums( counts(dds) >= 5 ) >= 50 #since I am working with >950 samples
dds <- dds[keep,]


But I keep seeing very low expressed genes as the top DE genes (lower padj) in my analysis.

Some examples that could be checked in the link above; these genes are DE in my analysis, but not in cBioPortal: "ALDH3A1" "STEAP1B" "GHSR" "KRT12"

One gene, "OR3A3", for example, is up-regulated in the “High” group in cBioPortal, but downregulated in the “High” group in my analysis.

Is there a way to get a better overlaping or that is what it is?

I will be glad to provide more details.

Thank you,

Alex

0
Entering edit mode

I didn't see an indication of using the cBioPortalData R package. I've added the TCGAbiolinks package tag instead.

FWIW, you should be able to get the same data from cBioPortalData with studyId = "brca_tcga_pan_can_atlas_2018" .

Best,

Marcel

0
Entering edit mode

Hi Marcel,

Your suggestion works for me. I can use the data got from cBioPortalData. What do you recommend to perform a differential expression analysis on the data “data_mrna_seq_v2_rsem”? I would prefer to use DESeq2 if it is possible.

Alex

1
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

Without knowing what code is used in the other pipeline, can’t offer much guidance.

Note that Bioconductor RNA packages often have a lot of agreement on their DE gene sets.