DESeq2 missing columns

0

Entering edit mode

Kuenne, Carsten ▴ 20

@kuenne-carsten-6658

Last seen 9.6 years ago

Dear mailing list, I recently found out about the new DESeq version and am currently in the process of switching my pipelines to use it. You included some sweet new features and those may simplify my life a lot. Now my questions relate to the output tables bearing the final differently expressed genes. 1. Is there a plan to reintroduce the old DESeq 1 columns baseMeanA and baseMeanB? Or do you have some R code available to create a complete table DESeq-1-style? These two columns may be implicitly included in other values but it was just terribly useful to have them separately. 2. If the output table is written as a tab-delimited file instead of a CSV, the first column headline is removed (since it is empty) resulting in wrong headers for all columns. See the last line of the code below. Maybe you could include some header for this column ("id" or "feature" or something similar) to prevent this behaviour? [...] data = read.table("counts.matrix", header=T, row.names=1, com='') col_ordering = c(1,2,3,4) rnaseqMatrix = data[,col_ordering] rnaseqMatrix = round(rnaseqMatrix) rnaseqMatrix = rnaseqMatrix[rowSums(rnaseqMatrix)>=2,] conditions = data.frame(conditions=factor(c(rep("nha", 2), rep("nhc", 2)))) rownames(conditions) = colnames(rnaseqMatrix) ddsFullCountTable <- DESeqDataSetFromMatrix( countData = rnaseqMatrix, colData = conditions, design = ~ conditions) dds = DESeq(ddsFullCountTable) res = results(dds) write.table(as.data.frame(res[order(res$pvalue),]), file=deseq2_nha_nhc.txt', sep=' ', quote=FALSE, row.names=T) [...] [deseq2_nha_nhc.txt] baseMean log2FoldChange lfcSE pvalue padj comp277602_c0_seq1 438.902806642428 5.58864608712407 0.302717563529963 4.20808210669443e-76 3.08662822526037e-71 [...] Anyway thanks for the great update! Yours hopefully, Carsten [[alternative HTML version deleted]]

PROcess DESeq PROcess DESeq • 2.3k views

ADD COMMENT • link updated 9.8 years ago by Michael Love 41k • written 9.8 years ago by Kuenne, Carsten ▴ 20

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 15 hours ago

United States

hi Carsten, As DESeq2 is a more general model, the results tables don't necessarily involve only two conditions A and B, so to have the results table have consistent columns, we excluded these. Here is the code to reproduce the base means per condition: baseMeanA = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "A"]) baseMeanB = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "B"]) res = cbind(baseMeanA, baseMeanB, as.data.frame(res)) To force a header for the first column when writing out, you could add a column explicitly: res = cbind(genes=rownames(res), as.data.frame(res)) And then use row.names=FALSE when writing out. Mike On Mon, Jul 21, 2014 at 4:59 AM, Kuenne, Carsten <carsten.kuenne at="" mpi-bn.mpg.de=""> wrote: > Dear mailing list, > > > > I recently found out about the new DESeq version and am currently in the process of switching my pipelines to use it. You included some sweet new features and those may simplify my life a lot. Now my questions relate to the output tables bearing the final differently expressed genes. > > > > 1. Is there a plan to reintroduce the old DESeq 1 columns baseMeanA and baseMeanB? Or do you have some R code available to create a complete table DESeq-1-style? These two columns may be implicitly included in other values but it was just terribly useful to have them separately. > > 2. If the output table is written as a tab-delimited file instead of a CSV, the first column headline is removed (since it is empty) resulting in wrong headers for all columns. See the last line of the code below. Maybe you could include some header for this column ("id" or "feature" or something similar) to prevent this behaviour? > > > > [...] > > data = read.table("counts.matrix", header=T, row.names=1, com='') > > col_ordering = c(1,2,3,4) > > rnaseqMatrix = data[,col_ordering] > > rnaseqMatrix = round(rnaseqMatrix) > > rnaseqMatrix = rnaseqMatrix[rowSums(rnaseqMatrix)>=2,] > > conditions = data.frame(conditions=factor(c(rep("nha", 2), rep("nhc", 2)))) > > rownames(conditions) = colnames(rnaseqMatrix) > > ddsFullCountTable <- DESeqDataSetFromMatrix( > > countData = rnaseqMatrix, > > colData = conditions, > > design = ~ conditions) > > dds = DESeq(ddsFullCountTable) > > res = results(dds) > > write.table(as.data.frame(res[order(res$pvalue),]), file=deseq2_nha_nhc.txt', sep=' ', quote=FALSE, row.names=T) > > [...] > > > > [deseq2_nha_nhc.txt] > > baseMean log2FoldChange lfcSE pvalue padj > > comp277602_c0_seq1 438.902806642428 5.58864608712407 0.302717563529963 4.20808210669443e-76 3.08662822526037e-71 > > [...] > > > > Anyway thanks for the great update! > > > > Yours hopefully, > > Carsten > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 9.8 years ago Michael Love 41k

Login before adding your answer.