I am posting in regards to an error I've experience when running Deseq2's estimateSizeFactors. The specific error is posted below:
> dds = DESeqDataSetFromMatrix(countData=filteredDGE, colData=design, design=~condition) converting counts to integer mode > dds = estimateSizeFactors(dds) Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : every gene contains at least one zero, cannot compute log geometric means
I've run Deseq2 several times in the past but have never experienced this error. I have tried to compare my current input with prior input from previous analyses but fail to see differences in formatting. The error seems very straightforward however I do not understand it. My data.frame will contain zeros in it because there are genes not being hit or expressed within samples. I have posted below the code I am using. `filteredDGE` is a data.frame with 500 columns (samples) and about 200 rows (genes).
DGE = read.table(paste(getwd(), "/output/AGGCAGAA_final/AGGCAGAA_final_Collapsed_DGE_HUMAN.txt", sep=""), row.names=1, header=TRUE, sep="\t") n = 80 filteredDGE = DGE[rowSums(DGE==0)<=ncol(DGE)*(n/100),] library(DESeq2) design = data.frame(row.names=names(filteredDGE), condition=as.factor(names(filteredDGE))) dds = DESeqDataSetFromMatrix(countData=filteredDGE, colData=design, design=~condition) dds = estimateSizeFactors(dds)
I would greatly appreciate any help you could toss my way.
EDIT: I should state that this is in regard to single-celled data. I'm only interested in performing a counts normalization/transformation. There is no need to run differential analysis.
You might try the edgeR normalization, which I believe an handle data with a zero in every row. Unfortunately, you'll have to work out how to convert edgeR normalization factors to DESeq2 size factors. I don't remember the conversion off the top of my head (which is why I made this a comment and not an answer).