Data Structure for Read Count Analysis by DESeq2
2
0
Entering edit mode
@hamidreza-hashemi-23384
Last seen 3.7 years ago
United States

Hi,

I am new to R and DESeq2 package for RNAseq analysis. I am trying to analyze the read counts of 2 samples (M1, M2) as 3 biological triplicates (M11, M12, M13 and M21, M22, M23). I read the files into R as .csv but when I try to create a dds I get the following error. Could you please help me? Is something wrong with my data format?

Read_Counts <- read.csv("Read Counts.csv", header =  TRUE)
head(Read_Counts)
       ï..Gene_ID SP_18 SP_23 SP_28 SP_20 SP_25 SP_30
1 ENSG00000000003    88    45    30    70   100   151
2 ENSG00000000419   604   920   828   905   596  1047
3 ENSG00000000457   258   242   153   252   119   135
4 ENSG00000000460    77    70    51   152    76    75
5 ENSG00000000938  3074  3672  2948  5560  5434  7641
6 ENSG00000000971  4521   115    55    42     1     0


Meta_Data <- read.csv("Meta Data.csv", header = TRUE)
head(Meta_Data)
  ï..Sample_ID Condition CellType
1        SP_18      M1_1       M1
2        SP_23      M1_2       M1
3        SP_28      M1_3       M1
4        SP_20      M2_1       M2
5        SP_25      M2_2       M2
6        SP_30      M2_3       M2


dds <- DESeqDataSetFromMatrix(countData = Read_Counts, colData = Meta_Data, design = ~ CellType)
Error in DESeqDataSetFromMatrix(countData = Read_Counts, colData = Meta_Data,  : 
  ncol(countData) == nrow(colData) is not TRUE.
deseq2 software error • 1.5k views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 28 days ago
Republic of Ireland

Hi, for your specific data, essentially, the following conditions should be true before you can run DESeqDataSetFromMatrix():

ncol(ReadCounts) == nrow(MetaData)
colnames(ReadCounts) == rownames(MetaData)

To help you, it looks like you need to do the following:

rownames(Read_Counts) <- Read_Counts[,1]
Read_Counts <- data.matrix(Read_Counts[,-1])

rownames(MetaData) <- MetaData[,1]

If you could please review the input and output of each of these steps in order to understand what is happening, that would be great for learning purposes.

Kevin

ADD COMMENT
1
Entering edit mode

Thanks a lot, Kevin. I just learn as I go, bits from here and there. I just removed the column header for the gene symbols and now its working. I appreciate your advice and will keep it in my script.

ADD REPLY
0
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 16 hours ago
San Diego

When you have gene names as a column, the software thinks it's just weird looking sample column.

If you look at the details of read.table, you can see that it's expecting the row name column to not have a name.

So reread how read.table works (it works just like read.csv), and import your data so that the gene names are rownames.

ADD COMMENT
0
Entering edit mode

Thank you so much. I just removed the column header for the gene symbol column and it worked.

ADD REPLY

Login before adding your answer.

Traffic: 648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6