Making contrasts for multiple treatment groups using DESeq2
1
0
Entering edit mode
Manav • 0
@6c392f63
Last seen 17 months ago
United States

Hi,

I am running an RNA Seq analysis for differential gene expressions for the data described as follows with biological replicates: 3 Control, 3 Treatment a, 4 Treatment b and 3 Treatment c.

I need to make the following comparisons: Treatment a vs Control, Treatment b vs Control, Treatment c vs Control and Treatment b vs Treatment c. I am confused as to where should I write the contrasts? Can you please explain the difference between factor levels and contrasts?

Thank you for your help.

# load libraries
library(DESeq2)

# Step 1: preparing the count data

  # read in counts data
data <- read.csv('gene_count_matrix.csv', header = T, row.names = 1L) # Don't forget to put row names as ENSEMBL IDs, else code will not run
head(data)
##                    Control_1 Control_2 Control_3 Treatment a_1
## ENSMUSG00000028180           397           372           433       442
## ENSMUSG00000028182             39             21             59         31
## ENSMUSG00000028185              0              0              0          0
## ENSMUSG00000028184           884           991           735       903
## ENSMUSG00000028187           11           54           11       146
## ENSMUSG00000028186              0              0              0          3
##                    Treatment a_2 Treatment a_3 Treatment b_1 Treatment b_2
## ENSMUSG00000028180       427       473           397           318
## ENSMUSG00000028182         9         6             6             4
## ENSMUSG00000028185          0          0              0              0
## ENSMUSG00000028184       814       771          110          154
## ENSMUSG00000028187       142       173           13           22
## ENSMUSG00000028186          0          0              8              0
##                   Treatment b_3 Treatment b_4 Treatment c_1 Treatment c_2
## ENSMUSG00000028180           35           31       93       44
## ENSMUSG00000028182             38             24         40         42
## ENSMUSG00000028185              0              0          0          0
## ENSMUSG00000028184           926           857      1018      1960
## ENSMUSG00000028187           152           196       179       160
## ENSMUSG00000028186              0              0          4          4
##                   Treatment c_3
## ENSMUSG00000028180       3453
## ENSMUSG00000028182         38
## ENSMUSG00000028185          0
## ENSMUSG00000028184       9514
## ENSMUSG00000028187       1663
## ENSMUSG00000028186          0

  # read in sample info
pheno <- read.csv('pheno_sheet.csv', header = T, row.names = 1L)

  # making sure the row names in colData matches to column names in counts_data
all(colnames(data) %in% rownames(pheno))
## [1] TRUE
  # are they in the same order?
all(colnames(data) == rownames(pheno))
## [1] TRUE


# Step 2: construct a DESeqDataSet object

dds <- DESeqDataSetFromMatrix(countData = data,
                              colData = pheno,
                              design = ~ Sample)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]
dds # n = 55,401
## class: DESeqDataSet 
## dim: 55401 13 
## metadata(1): version
## assays(1): counts
## rownames(55401): ENSMUSG00000028180 ENSMUSG00000028182 ...
##   ENSMUSG00000036958 ENSMUSG00000042678
## rowData names(0):
## colnames(13): Control_1 Control_2 ... Treatment c_2 Treatment c_3
## colData names(1): Sample
  # pre-filtering: removing rows with low gene counts
  # This is recommended step as it helps reduce the size of the dds object and increase speed of computation

  # keeping rows that have at least 10 reads total
keep <- rowSums(counts(dds)) >= 10  # >=10 because recommended in the vignette
dds <- dds[keep,]
dds # n= 24,038
## class: DESeqDataSet 
## dim: 24038 13 
## metadata(1): version
## assays(1): counts
## rownames(24038): ENSMUSG00000028180 ENSMUSG00000028182 ...
##   ENSMUSG00000036959 ENSMUSG00000042678
## rowData names(0):
## colnames(13): Control_1 Control_2 ... Treatment c_2 Treatment c_3
## colData names(1): Sample

# set the factor level
dds$Sample<- relevel(dds$Sample, ref = "Control") # If the reference level is not mentioned, then it will automatically choose the reference alphabetically
dds$Sample
##  [1] Control Control Control Treatment a     Treatment a     Treatment a     Treatment b Treatment b Treatment b
## [10] Treatment b Treatment c     Treatment c     Treatment c    
## Levels: Control Treatment a Treatment b Treatment c


# Step 3: Run DESeq
dds <- DESeq(dds)
## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]
## final dispersion estimates
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]
## fitting model and testing

res_Treatment a <- results(dds, contrast = c("Sample", "Treatment a", "Control"))
res_Treatment a
## log2 fold change (MLE): Sample Treatment a vs Control 
## Wald test p-value: Sample Treatment a vs Control 
## DataFrame with 24038 rows and 6 columns
##                       baseMean log2FoldChange     lfcSE       stat      pvalue
##                      <numeric>      <numeric> <numeric>  <numeric>   <numeric>
## ENSMUSG00000028180  3862.6757     -0.057747 0.070335 -0.756516  0.4286236
## ENSMUSG00000028182    41.9097     -0.017546 0.488878 -0.035134  0.9708700
## ENSMUSG00000028184  934.53415      0.213321 0.04522  2.638894  0.0081781
## ENSMUSG00000028187  163.88391      0.178966 0.095555  1.526086  0.1618069
## ENSMUSG00000028186     1.4051      3.8020443 3.816178  0.961816  0.3116193
## ...                        ...            ...       ...        ...         ...
## ENSMUSG00000106108    0.8605       0.000000 4.620361   0.00000 1.00000e+00
## ENSMUSG00000042675 5026.25958      -0.33392 0.069803  -5.128677 2.9785e-07
## ENSMUSG00000083767   50.466359       0.1324 0.435256   0.9007 7.3404e-01
## ENSMUSG00000036959 126.95391      -0.55607 0.126976  -4.27439 2.6366e-05
## ENSMUSG00000042678    3.27304       5.45275 3.965736   1.75693 1.8917e-01
##                           padj
##                      <numeric>
## ENSMUSG00000028180   0.647043
## ENSMUSG00000028182   0.985538
## ENSMUSG00000028184   0.03958
## ENSMUSG00000028187   0.3520263
## ENSMUSG00000028186          NA
## ...                        ...
## ENSMUSG00000106108          NA
## ENSMUSG00000042675 8.26780-06
## ENSMUSG00000083767 8.5413e-01
## ENSMUSG00000036959 3.4132e-04
## ENSMUSG00000042678          NA

summary(res_Treatment a)
## 
## out of 24038 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 2002, 8.3%
## LFC < 0 (down)     : 2421, 10%
## outliers [1]       : 17, 0.071%
## low counts [2]     : 9315, 39%
## (mean count < 25)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results


# QUESTIONS:

# I SET OTHER CONTRASTS FOR COMPARISIONS AT THE SAME STEP WHERE I GENERATED Treatment a VS CONTROL?
# I use these contrasts to generate MA plots, differentially expressed genes 
and other data visualizations individually for each contrast?
DESeq2 • 3.8k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

I SET OTHER CONTRASTS FOR COMPARISIONS AT THE SAME STEP WHERE I GENERATED Treatment a VS CONTROL?

Yes

I use these contrasts to generate MA plots, differentially expressed genes and other data visualizations individually for each contrast?

Yes

ADD COMMENT
0
Entering edit mode

Hi Michael, Thank you for the input.

ADD REPLY
0
Entering edit mode

Dear Michael Love . Some doubts on this question

1) Correct that as long as the contrasts are specified there's no need to set the factor level

#### so this step he has done is not needed 
dds$Sample<- relevel(dds$Sample, ref = "Control") 

2) If we were talking about multiple treatments and multiple time points, the way to go would be the same ?

e.g. sampleTable

sample condition
sample1 Time_1_Treatment_a
sample2 Time_1_Treatment_a
sample3 Time_1_Treatment_a
sample4 Control_Time_1
sample5 Control_Time_1
sample6 Control_Time_1
sample7 Time_2_Treatment_a
sample8 Time_2_Treatment_a
sample9 Time_2_Treatment_a
sample10 Control_Time_2
sample11 Control_Time_2
sample12 Control_Time_2

ddsMat <- DESeqDataSetFromMatrix(countData = txi, colData = sampleTable, design = ~condition)
dds <- DESeq(ddsMat)
res_Time_1_Treatment a <- results(dds, contrast = c("condition", "Time_1_Treatment_a", "Control_Time_1"), alpha = 0.05)
res_Time_2_Treatment a <- results(dds, contrast = c("condition", "Time_2_Treatment_a", "Control_Time_2"), alpha = 0.05)

3) is that important to filter genes like in the step described below? In the vignette it's written that this step can be skipped before running Deseq functions.

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

Thanks in advance

ADD REPLY
1
Entering edit mode

1)

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#note-on-factor-levels

2)

This is one approach.

3)

It can be skipped but there's no harm in removing low count genes, we usually use something like:

keep <- rowSums(counts(dds) >= 10) >= X

where X is the smallest group size in the experiment.

ADD REPLY

Login before adding your answer.

Traffic: 578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6