I have a question regarding the use of parallel computing for DESeq2 on the server. I’ve been working with this code for the last few days & I haven’t been able to find a remedy. I have been using the DEseq2 online manual for RNA seq analysis but it seems the processing is taking longer than what I have been able to do on my personal computer. Please let me know if there is anything else I need to send.
$ screen -S DESeq2 ### DESeq2 Analysis in Server ### setwd('/projects/home/cyoung304/data') library('DESeq2') cts <- read.csv(file='GeneName.csv') nrow(cts) ncol(cts) colData <- read.csv(file='colData.csv') ncol(colData) nrow(colData) rownames(cts) <- cts$Geneid cts$Geneid <- NULL library("BiocParallel") register(MulticoreParam(12)) dds <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design= ~ patient + condition) keep <- rowSums(counts(dds) >= 10) >= 5 dds <- dds[keep,] dds$condition <- relevel(dds$condition, ref='NT') # micheal said it doesnt matter what the refernce is b/c the comparision between the sample remains the same ddsColl <- collapseReplicates(dds, dds$id) ddsColl <- DESeq(ddsColl, fitType='local', parallel = TRUE) resultsNames(ddsColl) res <- results(dds, name="condition_NT_vs_MPT", lfcThreshold = 0.585, alpha=0.05) res resOrder <- res[order(res$padj),] write.csv(as.data.frame(resOrder), file='DESeqResults.csv')
make sure that the basics are working first, e.g.,
If not check out the manager.hostname and manager.port arguments to MulticoreParam (this could be tricky, finding out what ports (if any) are open on the cluster.