Hello,
The data is related to my previous post. We decided to remove 3 genes from the sample count matrix as they were also present in negative controls in very high count. When running DESeq(dds)
, we got the following error -
estimating size factors
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, :
every gene contains at least one zero, cannot compute log geometric means
It is worth noting, that our data is very sparse, and most of the counts are zero. Before running DESeq2, we filtered our data slightly differently. Instead of the default filtering strategy which is -
smallestGroupSize <- 3
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
We decided to remove genes based on how many samples didn't show expression, i.e based on the number of 0 counts across the gene.
more_than_50_pct <- rowSums(counts(dds) == 0) <= ncol(dds) /2
dds <- dds[more_than_50_pct,]
This reduced the number of genes to 1058. (We have 200 samples in one group, around 104 samples in another group, along with 18 negative controls and 4 blank water samples)
There are recommendations to add a pseudocount of 1 to the count table and to use estimateSizeFactors(dds, type = 'iterate')
, however my concerns are as follows,
- Due to the sparse nature of data, I'm afraid if it will skew the results.
- Even with a smaller subset of sample, it ran for 2+ hours and still couldn't finish the step :/
I wouldn't add 1 to the matrix.
I would use ATpoint advice and use a sc method. Or
type="poscounts"
.