Dear Communities,
The RSEM expected count data were downloaded from XENA UCSC (RSEM expected_count), which were round following the advice (DESeq2 Following RSEM). But the Ensembl ID was transformed before analysis to the official gene symbol. So there about near 2000 duplicated genes. Generally, these genes were removed by limma::avereps. But I noticed that the author of DESeq2 mentioned that "not normalized countscould be feed into DESeqDataSetFromMatrix". So I wonder whether limma::avereps conducted and then round, or round before limma::acereps could be suitable feed into DESeqDataSetFromMatrix? Any suggestions would be great appreciation!
# (1)floor before limma::avreps
Expected_count <- 2^Expected_count - 1
Expected_count <- floor(Expected_count)
rt = as.matrix(Expected_count)
rownames(rt) = rt[,1]
exp = rt[,2:ncol(rt)]
dimnames = list(rownames(exp),colnames(exp))
dataMatrix = matrix(as.numeric(as.matrix(exp)),
nrow = nrow(exp),
dimnames = dimnames)
dataMatrix %<>%
limma::avereps() %>%
as.data.frame() %>%
rownames_to_column('symbol')
# (2)floor after limma::avreps
Expected_count <- 2^Expected_count - 1
rt = as.matrix(Expected_count)
rownames(rt) = rt[,1]
exp = rt[,2:ncol(rt)]
dimnames = list(rownames(exp),colnames(exp))
dataMatrix = matrix(as.numeric(as.matrix(exp)),
nrow = nrow(exp),
dimnames = dimnames)
dataMatrix %<>%
limma::avereps() %>%
as.data.frame() %>%
rownames_to_column('symbol')
dataMatrix <- floor(dataMatrix)
Thanks for your reply and package! The RSEM Expect Count data were downloaded from XENA UCSC, which were conducted log2(x + 1) transformation already. As your suggested that "the sub-optimal compared to using tximport"(DESeq2 Following RSEM), so floor(2^Expected_count - 1) was conducted before to DESeq2. Should I take the duplicated genes information as covariates?
If you have log normalized data, use limma voom instead of DESeq2.
Got that, thanks!