When using DESeq2 in the past, count data and normalized count data, etc... retained the names of samples indicated in the first column of the colData table.
I've used dds = DESeqDataSetFromMatrix(countData = countdata,
colData = colData,
design = ~ treatment)
where colData looks like:
sample.name treatment surgergy treatment.1 day
(fctr) (fctr) (fctr) (fctr) (int)
1 GRC307R.15_S21_L001_R1_001 Day1AirDEMED DEMED Air 1
2 GRC307R.16_S10_L001_R1_001 Day1AirDEMED DEMED Air 1
3 GRC307R.17_S18_L001_R1_001 Day1AirDEMED DEMED Air 1
I expected the names of the columns in my count matrix from DESeq2 and subsequent analyses to contain the names (GRC307R.15_S21_L001_R1_001 etc...) but instead they are named 1:45 for the 45 samples.
Any ideas? Thanks for your time.
I assigned row.names(colData) to be my sample.name
I ended up with an error when I tried define dds.
> colData<-(hash_table);head(colData)
sample treatment surgergy treatment.1 day rep group
GRC307R.2p_S24_L001_R1_001 GRC307R.2p_S24_L001_R1_001 Day1AirSHAM SHAM Air 1 1 Day1AirSHAM1
GRC307R.6_S17_L001_R1_001 GRC307R.6_S17_L001_R1_001 Day1AirSHAM SHAM Air 1 2 Day1AirSHAM2
GRC307R.7_S2_L001_R1_001 GRC307R.7_S2_L001_R1_001 Day1AirSHAM SHAM Air 1 3 Day1AirSHAM3
GRC307R.8_S15_L001_R1_001 GRC307R.8_S15_L001_R1_001 Day1AirSHAM SHAM Air 1 4 Day1AirSHAM4
GRC307R.37_S22_L001_R1_001 GRC307R.37_S22_L001_R1_001 Day1OzoneSHAM SHAM Ozone 1 1 Day1OzoneSHAM1
GRC307R.38_S12_L001_R1_001 GRC307R.38_S12_L001_R1_001 Day1OzoneSHAM SHAM Ozone 1 2 Day1OzoneSHAM2
> dds = DESeqDataSetFromMatrix(countData = countdata,
+ colData = colData,
+ design = ~ treatment)
Error in DESeqDataSetFromMatrix(countData = countdata, colData = colData, :
rownames of the colData:
GRC307R.2p_S24_L001_R1_001,GRC307R.6_S17_L001_R1_001,GRC307R.7_S2_L001_R1_001,GRC307R.8_S15_L001_R1_001,GRC307R.37_S22_L001_R1_001,GRC307R.38_S12_L001_R1_0
Do the columns of the countData have to be in the same order as rows of colData?
Second question while I have you. I have counts from Partek flow that assigns reads based on EM. I rounded the matrix of counts to nearest integer to feed integers into DESeq2. Is that still considered acceptable?
The columns and the rows need to be in the exact same order. This is very important! From our RNA-seq workflow:
"If you’ve counted reads with some other software, it is very important to check that the columns of the count matrix correspond to the rows of the sample information table."
From the help page for
?DESeqDataSetFromMatrix
:"Rows of colData correspond to columns of countData"
Yes, we have evaluated recently that rounded estimated counts (from an EM) can be used as input to DESeq2. What shouldn't be used is any kind of "normalized counts" which would mean that they have been divided by something or transformed at all.