Question

DESeq2: Starting from count tables

0

Entering edit mode

jjinhyoungkim ▴ 20

@jjinhyoungkim-8476

Last seen 10.2 years ago

Canada

Hello,

I am a novice for R and bioinfomatics. I am trying to run DESeq2 with my raw count table. I have been trying to follow the beginner's guide for the DESeq2 package, but it is still hard to understand because my experimental condition is different from the example. I tried to run with a kind help of poinsoAlien, but encounter the error. Would you please help me to figure it out? Thank you in advance for your help.

Here is my experiemtal design and the code I tried.

Treatment: Control vs Treated
Animal group: each treatment has 4 genotypes in duplicate.

Control (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2).

Treated (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2).

Model: Treatment, Animal group, Aniaml * Treatment (interaction of animal and treatment).

I want to run DESeq2 with 2-way ANOVA-like frame work (2 treatment X 4 genotypes).

Code

> sampleInfo <- read.csv("~/Desktop/KiPBSvsPIC.csv")
> head(sampleInfo)
               ID Genotype1.1 Genotype1.2 Genotype2.1 Genotype2.2 Genotype3.1
1 unigene22398972      223736       97095      182659      161499      288360
2 unigene22407675      197279      218530      241506      194318      189604
3 unigene22402019      102051       50008       68617       60005       98462
4 unigene22387256       82289       93196       88090       69212       67281
5 unigene22410725       26403       51824       35820       36544       21938
6 unigene22400817       45626       20860       28781       24550       43365
Genotype3.2 Genotype4.1 Genotype4.2 Genotype1.1_treated Genotype1.2_treated
1      280347      160611      239753              162563               51277
2      158650      184416      173256              173624              185385
3      138085       39193       79512               68575               28035
4       55019       62363       71076               54943               58875
5       22089       33376       21841               31780               27587
6       55484       15369       30763               26308               11168
Genotype2.1_treated Genotype2.2_treated Genotype3.1_treated
1              286391               92844              243987
2              161664              154975              187180
3               93280               30827               88396
4               58179               54870               82084
5               32564               25289               27143
6               40633               13106               34436
Genotype3.2_treated Genotype4.1_treated Genotype4.2_treated
1              314833              120738              105184
2              149823              173155              163640
3              136919               33087               27698
4               55364               99173               80765
5               18275               33866               26191
6               54273               13731               10517
> coldata = data.frame(row.names = c('genotype1-1', 'genotype1-2', 'genotype2-1', 'genotype2-2', 'genotype3-1', 'genotype3-2', 'genotype4-1', 'genotype4-2','genotype1-1_treated', 'genotype1-2_treated', 'genotype2-1_treated', 'genotype2-2_treated', 'genotype3-1_treated', 'genotype3-2_treated', 'genotype4-1_treated', 'genotype4-2_treated'),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))
> coldata$treatment = factor(x = coldata$treatment,levels = c('treated','control'))
> coldata
                    group treatment
genotype1-1           gt1   control
genotype1-2           gt1   control
genotype2-1           gt2   control
genotype2-2           gt2   control
genotype3-1           gt3   control
genotype3-2           gt3   control
genotype4-1           gt4   control
genotype4-2           gt4   control
genotype1-1_treated   gt1   treated
genotype1-2_treated   gt1   treated
genotype2-1_treated   gt2   treated
genotype2-2_treated   gt2   treated
genotype3-1_treated   gt3   treated
genotype3-2_treated   gt3   treated
genotype4-1_treated   gt4   treated
genotype4-2_treated   gt4   treated
> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)
Error in as.matrix(countData) :
error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Error: object 'countMatrix' not found
Calls: DESeqDataSetFromMatrix -> as.matrix

deseq2 • 11k views

ADD COMMENT • link updated 10.4 years ago by Michael Love 43k • written 10.4 years ago by jjinhyoungkim ▴ 20

score 0 · Answer 1 · 2015-07-25

0

Entering edit mode

arfranco ▴ 130

@arfranco-8341

Last seen 19 months ago

European Union

I see a putative problem, In DESeq2 gene names (the ID column) have to be placed as rownames. Run this code

rownames(sampleInfo) <- sampleInfo[ , 1]

sampleInfo <- sampleInfo[,-1]

ADD COMMENT • link 10.4 years ago arfranco ▴ 130

score 0 · Answer 2 · 2015-07-25

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

The error message is telling you quite literally the problem:

"Error: object 'countMatrix' not found"

this means that a variable 'countMatrix' is not defined in the environment. You can see what is defined with:

ls()

You have to tell the DESeqDataSetFromMatrix() the name of a matrix which is already defined and which contains the counts.

It looks like the count information is in 'sampleInfo' if you remove the columns which are not counts.

ADD COMMENT • link 10.4 years ago Michael Love 43k

0

Entering edit mode

Thank you guy for your help.

ADD REPLY • link 10.4 years ago jjinhyoungkim ▴ 20

0

Entering edit mode

Sorry again, I am in trouble with another error below.

> library(DESeq2)
> library("DESeq2")
> countMatrix <- read.table ("~/Desktop/KiPBSvsPIC.txt", header=TRUE)
> head (countMatrix)
               ID Genotype1.1 Genotype1.2 Genotype2.1 Genotype2.2 Genotype3.1
1 unigene22398972      223736       97095      182659      161499      288360
2 unigene22407675      197279      218530      241506      194318      189604
3 unigene22402019      102051       50008       68617       60005       98462
4 unigene22387256       82289       93196       88090       69212       67281
5 unigene22410725       26403       51824       35820       36544       21938
6 unigene22400817       45626       20860       28781       24550       43365
Genotype3.2 Genotype4.1 Genotype4.2 Genotype1.1_treated Genotype1.2_treated
1      280347      160611      239753              162563               51277
2      158650      184416      173256              173624              185385
3      138085       39193       79512               68575               28035
4       55019       62363       71076               54943               58875
5       22089       33376       21841               31780               27587
6       55484       15369       30763               26308               11168
Genotype2.1_treated Genotype2.2_treated Genotype3.1_treated
1              286391               92844              243987
2              161664              154975              187180
3               93280               30827               88396
4               58179               54870               82084
5               32564               25289               27143
6               40633               13106               34436
Genotype3.2_treated Genotype4.1_treated Genotype4.2_treated
1              314833              120738              105184
2              149823              173155              163640
3              136919               33087               27698
4               55364               99173               80765
5               18275               33866               26191
6               54273               13731               10517
> #form coldata
> coldata = data.frame(row.names = c('genotype1-1', 'genotype1-2', 'genotype2-1', 'genotype2-2', 'genotype3-1', 'genotype3-2', 'genotype4-1', 'genotype4-2','genotype1-1_treated', 'genotype1-2_treated', 'genotype2-1_treated', 'genotype2-2_treated', 'genotype3-1_treated', 'genotype3-2_treated', 'genotype4-1_treated', 'genotype4-2_treated'),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))
> #make factors for treatment column.
> coldata$treatment = factor(x = coldata$treatment,levels = c('control','treated'))
> coldata
                    group treatment
genotype1-1           gt1   control
genotype1-2           gt1   control
genotype2-1           gt2   control
genotype2-2           gt2   control
genotype3-1           gt3   control
genotype3-2           gt3   control
genotype4-1           gt4   control
genotype4-2           gt4   control
genotype1-1_treated   gt1   treated
genotype1-2_treated   gt1   treated
genotype2-1_treated   gt2   treated
genotype2-2_treated   gt2   treated
genotype3-1_treated   gt3   treated
genotype3-2_treated   gt3   treated
genotype4-1_treated   gt4   treated
genotype4-2_treated   gt4   treated
> #construct DESeq object from input matrix (countdata)
> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol
Calls: DESeqDataSetFromMatrix ... .local -> new -> initialize -> initialize -> validObject
In addition: Warning message:
In sort(rownames(colData)) == sort(colnames(countData)) :
longer object length is not a multiple of shorter object length
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol
Calls: DESeqDataSetFromMatrix ... .local -> new -> initialize -> initialize -> validObject

ADD REPLY • link 10.4 years ago jjinhyoungkim ▴ 20

0

Entering edit mode

Look at the error message: 'colData' nrow differs from 'assays' ncol

This could admittedly be written a bit more clearly, but it means: The number of rows ("nrow") of colData differs from the number of columns ("ncol") of the assays matrix (i.e., "countMatrix").

This is because the countMatrix contains one more column, namely the gene IDs in the first column. If you use

countMatrix <- read.table ("~/Desktop/KiPBSvsPIC.txt", header=TRUE, row.names=1)

the first column in the file (i.e., the gene IDs) will be placed into the row names slot rather than into a separate column.

ADD REPLY • link 10.4 years ago Simon Anders ★ 3.8k

0

Entering edit mode

hi,

If you are just starting R, I have a couple of recommendations:

1) you should pay very careful attention to the text of the messages which are printed, and try to figure out what is wrong on your own through exploration of the variables you have defined.

Some very quick notes on how to explore your workspace are here:

http://www.statmethods.net/interface/workspace.html

http://www.statmethods.net/input/contents.html

2) I'd recommend that you take a free R course. Some good ones are:

http://swirlstats.com/

https://www.coursera.org/course/rprog

Some people can jump into R without any prior experience, usually because they have experience with other programming languages like python or Matlab. But investing a little time to learn the very basics of R will pay off quickly, and you can avoid mistakes.

While jumping straight into a Bioconductor package with no R experience is technically possible, it's just going to be much easier if you spend a little time learning the basics first.