Question

Some discrepancy between ED genes in cBioPortal vs DESeq2

0

Entering edit mode

Alexandre • 0

@04bd68e3

Last seen 20 months ago

Brazil

Hi,

I did an analysis through the cBioPortal website and another one using TCGAbiolinks to get the TCGA files and then DESeq2 locally. Although there is a good overlap (58%) of differentially expressed (DE) genes, I'm curious to understand why there isn't a more satisfactory overlap when comparing the two pipelines.

This is the analysis using cBioPortal: https://www.cbioportal.org/comparison/mrna?comparisonId=63b2d2551cec6922c422d9a2

I noticed the DE genes in my analysis that are not statistically DE in cBioPortal are mostly genes with low counts. I have already tried to filter these genes using:

keep <- rowSums( counts(dds) >= 5 ) >= 50 #since I am working with >950 samples
dds <- dds[keep,]

But I keep seeing very low expressed genes as the top DE genes (lower padj) in my analysis.

Some examples that could be checked in the link above; these genes are DE in my analysis, but not in cBioPortal: "ALDH3A1" "STEAP1B" "GHSR" "KRT12"

One gene, "OR3A3", for example, is up-regulated in the “High” group in cBioPortal, but downregulated in the “High” group in my analysis.

Is there a way to get a better overlaping or that is what it is?

I will be glad to provide more details.

Thank you,

Alex

DESeq2 TCGAbiolinks • 1.3k views

ADD COMMENT • link 2.2 years ago Alexandre • 0

0

Entering edit mode

I didn't see an indication of using the cBioPortalData R package. I've added the TCGAbiolinks package tag instead.

FWIW, you should be able to get the same data from cBioPortalData with studyId = "brca_tcga_pan_can_atlas_2018" .

Best,

Marcel

ADD REPLY • link 2.2 years ago Marcel Ramos 700

0

Entering edit mode

Hi Marcel,

Thank you for your response.

Your suggestion works for me. I can use the data got from cBioPortalData. What do you recommend to perform a differential expression analysis on the data “data_mrna_seq_v2_rsem”? I would prefer to use DESeq2 if it is possible.

Thank you for your time.

Alex

ADD REPLY • link 2.2 years ago Alexandre • 0

score 1 · Answer 1 · 2023-01-05

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

Without knowing what code is used in the other pipeline, can’t offer much guidance.

Note that Bioconductor RNA packages often have a lot of agreement on their DE gene sets.

ADD COMMENT • link 2.2 years ago Michael Love 43k