Search
Question: DESeq2 missing columns
0
3.9 years ago by
Kuenne, Carsten20 wrote:
Dear mailing list, I recently found out about the new DESeq version and am currently in the process of switching my pipelines to use it. You included some sweet new features and those may simplify my life a lot. Now my questions relate to the output tables bearing the final differently expressed genes. 1. Is there a plan to reintroduce the old DESeq 1 columns baseMeanA and baseMeanB? Or do you have some R code available to create a complete table DESeq-1-style? These two columns may be implicitly included in other values but it was just terribly useful to have them separately. 2. If the output table is written as a tab-delimited file instead of a CSV, the first column headline is removed (since it is empty) resulting in wrong headers for all columns. See the last line of the code below. Maybe you could include some header for this column ("id" or "feature" or something similar) to prevent this behaviour? [...] data = read.table("counts.matrix", header=T, row.names=1, com='') col_ordering = c(1,2,3,4) rnaseqMatrix = data[,col_ordering] rnaseqMatrix = round(rnaseqMatrix) rnaseqMatrix = rnaseqMatrix[rowSums(rnaseqMatrix)>=2,] conditions = data.frame(conditions=factor(c(rep("nha", 2), rep("nhc", 2)))) rownames(conditions) = colnames(rnaseqMatrix) ddsFullCountTable <- DESeqDataSetFromMatrix( countData = rnaseqMatrix, colData = conditions, design = ~ conditions) dds = DESeq(ddsFullCountTable) res = results(dds) write.table(as.data.frame(res[order(res$pvalue),]), file=deseq2_nha_nhc.txt', sep=' ', quote=FALSE, row.names=T) [...] [deseq2_nha_nhc.txt] baseMean log2FoldChange lfcSE pvalue padj comp277602_c0_seq1 438.902806642428 5.58864608712407 0.302717563529963 4.20808210669443e-76 3.08662822526037e-71 [...] Anyway thanks for the great update! Yours hopefully, Carsten [[alternative HTML version deleted]] ADD COMMENTlink modified 3.9 years ago by Michael Love18k • written 3.9 years ago by Kuenne, Carsten20 0 3.9 years ago by Michael Love18k United States Michael Love18k wrote: hi Carsten, As DESeq2 is a more general model, the results tables don't necessarily involve only two conditions A and B, so to have the results table have consistent columns, we excluded these. Here is the code to reproduce the base means per condition: baseMeanA = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "A"]) baseMeanB = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "B"]) res = cbind(baseMeanA, baseMeanB, as.data.frame(res)) To force a header for the first column when writing out, you could add a column explicitly: res = cbind(genes=rownames(res), as.data.frame(res)) And then use row.names=FALSE when writing out. Mike On Mon, Jul 21, 2014 at 4:59 AM, Kuenne, Carsten <carsten.kuenne at="" mpi-bn.mpg.de=""> wrote: > Dear mailing list, > > > > I recently found out about the new DESeq version and am currently in the process of switching my pipelines to use it. You included some sweet new features and those may simplify my life a lot. Now my questions relate to the output tables bearing the final differently expressed genes. > > > > 1. Is there a plan to reintroduce the old DESeq 1 columns baseMeanA and baseMeanB? Or do you have some R code available to create a complete table DESeq-1-style? These two columns may be implicitly included in other values but it was just terribly useful to have them separately. > > 2. If the output table is written as a tab-delimited file instead of a CSV, the first column headline is removed (since it is empty) resulting in wrong headers for all columns. See the last line of the code below. Maybe you could include some header for this column ("id" or "feature" or something similar) to prevent this behaviour? > > > > [...] > > data = read.table("counts.matrix", header=T, row.names=1, com='') > > col_ordering = c(1,2,3,4) > > rnaseqMatrix = data[,col_ordering] > > rnaseqMatrix = round(rnaseqMatrix) > > rnaseqMatrix = rnaseqMatrix[rowSums(rnaseqMatrix)>=2,] > > conditions = data.frame(conditions=factor(c(rep("nha", 2), rep("nhc", 2)))) > > rownames(conditions) = colnames(rnaseqMatrix) > > ddsFullCountTable <- DESeqDataSetFromMatrix( > > countData = rnaseqMatrix, > > colData = conditions, > > design = ~ conditions) > > dds = DESeq(ddsFullCountTable) > > res = results(dds) > > write.table(as.data.frame(res[order(res$pvalue),]), file=deseq2_nha_nhc.txt', sep=' ', quote=FALSE, row.names=T) > > [...] > > > > [deseq2_nha_nhc.txt] > > baseMean log2FoldChange lfcSE pvalue padj > > comp277602_c0_seq1 438.902806642428 5.58864608712407 0.302717563529963 4.20808210669443e-76 3.08662822526037e-71 > > [...] > > > > Anyway thanks for the great update! > > > > Yours hopefully, > > Carsten > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor