Can I use limma::avereps to remove dupicated genes before conduct DEG based on RSEM Expected_count data?
1
0
Entering edit mode
Yang Shi ▴ 10
@ea61ff7a
Last seen 8 months ago
Zheng Zhou

Dear Communities,

The RSEM expected count data were downloaded from XENA UCSC (RSEM expected_count), which were round following the advice (DESeq2 Following RSEM). But the Ensembl ID was transformed before analysis to the official gene symbol. So there about near 2000 duplicated genes. Generally, these genes were removed by limma::avereps. But I noticed that the author of DESeq2 mentioned that "not normalized countscould be feed into DESeqDataSetFromMatrix". So I wonder whether limma::avereps conducted and then round, or round before limma::acereps could be suitable feed into DESeqDataSetFromMatrix? Any suggestions would be great appreciation!

# (1)floor before limma::avreps
Expected_count <- 2^Expected_count - 1
Expected_count <- floor(Expected_count)
rt = as.matrix(Expected_count)
rownames(rt) = rt[,1]
exp = rt[,2:ncol(rt)]
dimnames = list(rownames(exp),colnames(exp))
dataMatrix = matrix(as.numeric(as.matrix(exp)),
                    nrow = nrow(exp), 
                    dimnames = dimnames)
dataMatrix %<>%  
  limma::avereps() %>% 
  as.data.frame() %>% 
  rownames_to_column('symbol')

# (2)floor after limma::avreps
Expected_count <- 2^Expected_count - 1
rt = as.matrix(Expected_count)
rownames(rt) = rt[,1]
exp = rt[,2:ncol(rt)]
dimnames = list(rownames(exp),colnames(exp))
dataMatrix = matrix(as.numeric(as.matrix(exp)),
                    nrow = nrow(exp), 
                    dimnames = dimnames)
dataMatrix %<>%  
  limma::avereps() %>% 
  as.data.frame() %>% 
  rownames_to_column('symbol')
dataMatrix <- floor(dataMatrix)
limma RNASeq Expected_count DESeq2 • 927 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

Don't do any rounding or batch removal prior to DESeq2. Just original counts, and put relevant covariates in the design, e.g. ~batch + condition. See the workflow.

ADD COMMENT
0
Entering edit mode

Thanks for your reply and package! The RSEM Expect Count data were downloaded from XENA UCSC, which were conducted log2(x + 1) transformation already. As your suggested that "the sub-optimal compared to using tximport"(DESeq2 Following RSEM), so floor(2^Expected_count - 1) was conducted before to DESeq2. Should I take the duplicated genes information as covariates?

ADD REPLY
1
Entering edit mode

If you have log normalized data, use limma voom instead of DESeq2.

ADD REPLY
0
Entering edit mode

Got that, thanks!

ADD REPLY

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6