Question

DESeq2 counts and output variables are Numbers instead of Sample Names

1

Entering edit mode

jshouse ▴ 10

@jshouse-10956

Last seen 2.2 years ago

United States

When using DESeq2 in the past, count data and normalized count data, etc... retained the names of samples indicated in the first column of the colData table.

I've used dds = DESeqDataSetFromMatrix(countData = countdata,
colData = colData,
design = ~ treatment)

where colData looks like:

sample.name treatment surgergy treatment.1 day
(fctr) (fctr) (fctr) (fctr) (int)
1 GRC307R.15_S21_L001_R1_001 Day1AirDEMED DEMED Air 1
2 GRC307R.16_S10_L001_R1_001 Day1AirDEMED DEMED Air 1
3 GRC307R.17_S18_L001_R1_001 Day1AirDEMED DEMED Air 1

I expected the names of the columns in my count matrix from DESeq2 and subsequent analyses to contain the names (GRC307R.15_S21_L001_R1_001 etc...) but instead they are named 1:45 for the 45 samples.

Any ideas? Thanks for your time.

deseq2 • 920 views

ADD COMMENT • link 8.7 years ago jshouse ▴ 10

score 2 · Accepted Answer · 2016-06-22

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 8 days ago

United States

Recently, some changes in SummarizedExperiment (an upstream package which defines the superclass that DESeqDataSet is based on) affected this behavior. In the NEWS file (which can be found on the SummarizedExperiment landing page):

"assay colnames() must agree with colData rownames()"

But anyway, it looks to me like your colData rownames here are 1,2,3,etc. Before you build the DESeqDataSet, what is colnames(countdata) and rownames(colData)?

ADD COMMENT • link 8.7 years ago Michael Love 43k

0

Entering edit mode

I assigned row.names(colData) to be my sample.name

I ended up with an error when I tried define dds.

> colData<-(hash_table);head(colData)
sample treatment surgergy treatment.1 day rep group
GRC307R.2p_S24_L001_R1_001 GRC307R.2p_S24_L001_R1_001 Day1AirSHAM SHAM Air 1 1 Day1AirSHAM1
GRC307R.6_S17_L001_R1_001 GRC307R.6_S17_L001_R1_001 Day1AirSHAM SHAM Air 1 2 Day1AirSHAM2
GRC307R.7_S2_L001_R1_001 GRC307R.7_S2_L001_R1_001 Day1AirSHAM SHAM Air 1 3 Day1AirSHAM3
GRC307R.8_S15_L001_R1_001 GRC307R.8_S15_L001_R1_001 Day1AirSHAM SHAM Air 1 4 Day1AirSHAM4
GRC307R.37_S22_L001_R1_001 GRC307R.37_S22_L001_R1_001 Day1OzoneSHAM SHAM Ozone 1 1 Day1OzoneSHAM1
GRC307R.38_S12_L001_R1_001 GRC307R.38_S12_L001_R1_001 Day1OzoneSHAM SHAM Ozone 1 2 Day1OzoneSHAM2
> dds = DESeqDataSetFromMatrix(countData = countdata,
+ colData = colData,
+ design = ~ treatment)
Error in DESeqDataSetFromMatrix(countData = countdata, colData = colData, :
rownames of the colData:
GRC307R.2p_S24_L001_R1_001,GRC307R.6_S17_L001_R1_001,GRC307R.7_S2_L001_R1_001,GRC307R.8_S15_L001_R1_001,GRC307R.37_S22_L001_R1_001,GRC307R.38_S12_L001_R1_0

Do the columns of the countData have to be in the same order as rows of colData?

Second question while I have you. I have counts from Partek flow that assigns reads based on EM. I rounded the matrix of counts to nearest integer to feed integers into DESeq2. Is that still considered acceptable?

ADD REPLY • link 8.7 years ago jshouse ▴ 10

1

Entering edit mode

The columns and the rows need to be in the exact same order. This is very important! From our RNA-seq workflow:

"If you’ve counted reads with some other software, it is very important to check that the columns of the count matrix correspond to the rows of the sample information table."

From the help page for ?DESeqDataSetFromMatrix:

"Rows of colData correspond to columns of countData"

Yes, we have evaluated recently that rounded estimated counts (from an EM) can be used as input to DESeq2. What shouldn't be used is any kind of "normalized counts" which would mean that they have been divided by something or transformed at all.

ADD REPLY • link 8.7 years ago Michael Love 43k