Question

Can I use limma::avereps to remove dupicated genes before conduct DEG based on RSEM Expected_count data?

0

Entering edit mode

Yang Shi ▴ 10

@ea61ff7a

Last seen 12 months ago

Zheng Zhou

Dear Communities,

The RSEM expected count data were downloaded from XENA UCSC (RSEM expected_count), which were round following the advice (DESeq2 Following RSEM). But the Ensembl ID was transformed before analysis to the official gene symbol. So there about near 2000 duplicated genes. Generally, these genes were removed by limma::avereps. But I noticed that the author of DESeq2 mentioned that "not normalized countscould be feed into DESeqDataSetFromMatrix". So I wonder whether limma::avereps conducted and then round, or round before limma::acereps could be suitable feed into DESeqDataSetFromMatrix? Any suggestions would be great appreciation!

# (1)floor before limma::avreps
Expected_count <- 2^Expected_count - 1
Expected_count <- floor(Expected_count)
rt = as.matrix(Expected_count)
rownames(rt) = rt[,1]
exp = rt[,2:ncol(rt)]
dimnames = list(rownames(exp),colnames(exp))
dataMatrix = matrix(as.numeric(as.matrix(exp)),
                    nrow = nrow(exp), 
                    dimnames = dimnames)
dataMatrix %<>%  
  limma::avereps() %>% 
  as.data.frame() %>% 
  rownames_to_column('symbol')

# (2)floor after limma::avreps
Expected_count <- 2^Expected_count - 1
rt = as.matrix(Expected_count)
rownames(rt) = rt[,1]
exp = rt[,2:ncol(rt)]
dimnames = list(rownames(exp),colnames(exp))
dataMatrix = matrix(as.numeric(as.matrix(exp)),
                    nrow = nrow(exp), 
                    dimnames = dimnames)
dataMatrix %<>%  
  limma::avereps() %>% 
  as.data.frame() %>% 
  rownames_to_column('symbol')
dataMatrix <- floor(dataMatrix)

limma RNASeq Expected_count DESeq2 • 1.6k views

ADD COMMENT • link 3.4 years ago Yang Shi ▴ 10

score 2 · Accepted Answer · 2022-07-19

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 7 days ago

United States

Don't do any rounding or batch removal prior to DESeq2. Just original counts, and put relevant covariates in the design, e.g. ~batch + condition. See the workflow.

ADD COMMENT • link 3.4 years ago Michael Love 43k

0

Entering edit mode

Thanks for your reply and package! The RSEM Expect Count data were downloaded from XENA UCSC, which were conducted log2(x + 1) transformation already. As your suggested that "the sub-optimal compared to using tximport"(DESeq2 Following RSEM), so floor(2^Expected_count - 1) was conducted before to DESeq2. Should I take the duplicated genes information as covariates?

ADD REPLY • link 3.4 years ago Yang Shi ▴ 10

1

Entering edit mode

If you have log normalized data, use limma voom instead of DESeq2.

ADD REPLY • link 3.4 years ago Michael Love 43k

0

Entering edit mode

Got that, thanks!

ADD REPLY • link 3.4 years ago Yang Shi ▴ 10