Hi all,
I am approaching the analysis of single-cell RNA-seq data.
I have seen that Seurat package offers the option in FindMarkers
(or also with the function DESeq2DETest
) to use DESeq2 to analyze differential expression in two group of cells.
Assuming I have group A containing n_A cells and group_B containing n_B cells, is the result of the analysis identical to running DESeq2 on raw counts of each gene in n_A versus n_B samples? And is there a way to speed up the analysis when n_A and n_B are in the order if a few thousands cells?
In addition, I have a 'technical' question:
when I have a count table in the form of a data.frame (for example read with read.table from a text file), is it necessary to force it to matrix such as cts <- as.matrix(cts)
before providing it as input to DESeqDataSetFromMatrix
? It seems to be I can just provide the count table as a data frame.
Thanks,
Claire
Thank you! I had seen in this paper by Soneson&Robinson https://www.nature.com/articles/nmeth.4612 that DESeq2 was employed, however the number of cells were much lower. In addition, input used was transcripts per million, while I think Seurat uses raw counts. In any case, I will look at the recommendations you point out, which I had missed.
Regarding the internal matrix conversion, then this was maybe already present in DESeq2 versions from years ago as I had seen using a data frame in some old code as well.