Question

Data Structure for Read Count Analysis by DESeq2

0

Entering edit mode

Hamidreza Hashemi ▴ 20

@hamidreza-hashemi-23384

Last seen 4.9 years ago

United States

Hi,

I am new to R and DESeq2 package for RNAseq analysis. I am trying to analyze the read counts of 2 samples (M1, M2) as 3 biological triplicates (M11, M12, M13 and M21, M22, M23). I read the files into R as .csv but when I try to create a dds I get the following error. Could you please help me? Is something wrong with my data format?

Read_Counts <- read.csv("Read Counts.csv", header =  TRUE)
head(Read_Counts)
       ï..Gene_ID SP_18 SP_23 SP_28 SP_20 SP_25 SP_30
1 ENSG00000000003    88    45    30    70   100   151
2 ENSG00000000419   604   920   828   905   596  1047
3 ENSG00000000457   258   242   153   252   119   135
4 ENSG00000000460    77    70    51   152    76    75
5 ENSG00000000938  3074  3672  2948  5560  5434  7641
6 ENSG00000000971  4521   115    55    42     1     0


Meta_Data <- read.csv("Meta Data.csv", header = TRUE)
head(Meta_Data)
  ï..Sample_ID Condition CellType
1        SP_18      M1_1       M1
2        SP_23      M1_2       M1
3        SP_28      M1_3       M1
4        SP_20      M2_1       M2
5        SP_25      M2_2       M2
6        SP_30      M2_3       M2


dds <- DESeqDataSetFromMatrix(countData = Read_Counts, colData = Meta_Data, design = ~ CellType)
Error in DESeqDataSetFromMatrix(countData = Read_Counts, colData = Meta_Data,  : 
  ncol(countData) == nrow(colData) is not TRUE.

deseq2 software error • 2.4k views

ADD COMMENT • link updated 5.8 years ago by swbarnes2 ★ 1.4k • written 5.8 years ago by Hamidreza Hashemi ▴ 20

score 0 · Answer 1 · 2020-04-20

0

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 4 weeks ago

The Cave, 181 Longwood Avenue, Boston, …

Hi, for your specific data, essentially, the following conditions should be true before you can run DESeqDataSetFromMatrix():

ncol(ReadCounts) == nrow(MetaData)
colnames(ReadCounts) == rownames(MetaData)

To help you, it looks like you need to do the following:

rownames(Read_Counts) <- Read_Counts[,1]
Read_Counts <- data.matrix(Read_Counts[,-1])

rownames(MetaData) <- MetaData[,1]

If you could please review the input and output of each of these steps in order to understand what is happening, that would be great for learning purposes.

Kevin

ADD COMMENT • link 5.8 years ago Kevin Blighe ★ 4.0k

1

Entering edit mode

Thanks a lot, Kevin. I just learn as I go, bits from here and there. I just removed the column header for the gene symbols and now its working. I appreciate your advice and will keep it in my script.

ADD REPLY • link 5.8 years ago Hamidreza Hashemi ▴ 20

score 0 · Answer 2 · 2020-04-20

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 11 days ago

San Diego

When you have gene names as a column, the software thinks it's just weird looking sample column.

If you look at the details of read.table, you can see that it's expecting the row name column to not have a name.

So reread how read.table works (it works just like read.csv), and import your data so that the gene names are rownames.

ADD COMMENT • link 5.8 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Thank you so much. I just removed the column header for the gene symbol column and it worked.

ADD REPLY • link 5.8 years ago Hamidreza Hashemi ▴ 20