I have been using variancestabilizingtransformation for normalizing single cell RNA-seq data. I have assembled a large data set of >1500 cells (samples) that contain information on about 40,000 genes. variancestabilizingtransformation has been running for about 24 hours. Will this process finish? previously I have used variancestabilizingtransformation on a sample set of 406 cells/samples and 32000 genes, which took 6 hours. If the increase in processor time is not linear how bad is this? The DESEQ manual stated that variancestabilizingtransformation should be faster than rlog, but they test on a sample data set of 20 samples and 1000 genes.
Another question, vst does not run as a parallel process, is there a method to do this is this not possible?
Example of what I ran/running:
#build the DESEQ2 object
ddsEmbF<- DESeqDataSetFromMatrix(countData = embryoF, colData=condEmbF, design = ~Characteristics.developmental.stage.)
dim: 42761 1528
rownames(42761): 5S_rRNA 7SK ... snoZ6 yR211F11.2
colnames(1528): E3.1.443 E3.1.444 ... E7.9.573 E7.9.574
colData names(21): names Comment.ENA_SAMPLE. ...
#normalize the data