my data set comes from single-cell SMART-Seq data, where I have in total 48 samples, 24 samples for the control and 24 for my KO. Each sample represent one cell.
I have used STAR
to map the samples against my indexed genome and featureCounts
to quantify the reads onto the genes.
If I do a pre-filtering before running the DESeq function only very little genes are left, as I'm not sure what a meaningful filter threshold would be.
keep <- rowSums(counts(dds) <= 100 ) == 12
table(keep)
keep
FALSE TRUE
56869 141
I was wondering if this is a feasible way of analysing the data, as I have many 0 reads in the count matrix (which would have been expected from the single-cell data set.
(or would the kallisto
-> tximport
-> DESeq2
or Kallisto -> RSEM way would be better in this case?)
thanks for the advice
Assa
The choice of preprocessing is on you. There are a variety of benchmark papers and preprints you can get guidance from.
Towards prefiltering, I personally think it's more meaningful gor single-cell data to filter for the percent of cells per condition/cluster/group expressing a gene rather than count cutoffs. For example, at least 10% of cells of st least one group should express (count>0) a gene so it's not a spurious detection. DESeq2 vignette has a single-cell section with recommendations for analysis.
In any case, if you have biological replicates you might consider pseudobulk analysis.
I haven't seen this part before. Thanks for pointing it out.