Question: DESeq2 Input format
0
2.6 years ago by
uzer0
uzer0 wrote:

Hi all,

I have extracted raw count data from raw reads file, and my data set is as follows: The first column is the gene name, and the next four rows are counts for each sample, the first two being control data, and the next two being experimental data of interest. It is very simple, and looks like below. Please note I have already cleaned the data and accounted for feature overlap and intersection.

gene_symbol_1 1 2 3 4

gene_symbol_2 2 3 4 5

gene_symbol_3 0 11 2 7

.....

The parameters for DESeqDataSetFromMatrix() are as such:

countData := can just be the raw counts

but I am confused as to enter the variables colData, and design.

How can encode the colData matrix? After I have properly encoded colData, how do I input design? I have observed the following sources, but it is not so clear in my context, because I do not have a "summarized experiment" object:

https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf

https://www.bioconductor.org/help/course-materials/2015/LearnBioconductorFeb2015/B02.1.1_RNASeqLab.html#construct

deseq2 counts • 1.0k views
modified 2.6 years ago by Michael Love23k • written 2.6 years ago by uzer0
2
2.6 years ago by
Michael Love23k
United States
Michael Love23k wrote:

colData is a data.frame or DataFrame which contains the information about the columns of the count matrix, i.e. the samples. You could read more about these arguments by typing in the R console:

?DESeqDataSetFromMatrix

With a simple two vs two you can do:

colData <- data.frame(condition=factor(c("C","C","T","T")))

The design is an R formula which tells DESeq2 how you want to analyze the data. If you look at the DESeq2 help pages or vignette, you will see that the design for such a comparison is simply: ~ condition.