Question

Making contrasts for multiple treatment groups using DESeq2

0

Entering edit mode

Manav • 0

@6c392f63

Last seen 17 months ago

United States

Hi,

I am running an RNA Seq analysis for differential gene expressions for the data described as follows with biological replicates: 3 Control, 3 Treatment a, 4 Treatment b and 3 Treatment c.

I need to make the following comparisons: Treatment a vs Control, Treatment b vs Control, Treatment c vs Control and Treatment b vs Treatment c. I am confused as to where should I write the contrasts? Can you please explain the difference between factor levels and contrasts?

Thank you for your help.

# load libraries
library(DESeq2)

# Step 1: preparing the count data

  # read in counts data
data <- read.csv('gene_count_matrix.csv', header = T, row.names = 1L) # Don't forget to put row names as ENSEMBL IDs, else code will not run
head(data)
##                    Control_1 Control_2 Control_3 Treatment a_1
## ENSMUSG00000028180           397           372           433       442
## ENSMUSG00000028182             39             21             59         31
## ENSMUSG00000028185              0              0              0          0
## ENSMUSG00000028184           884           991           735       903
## ENSMUSG00000028187           11           54           11       146
## ENSMUSG00000028186              0              0              0          3
##                    Treatment a_2 Treatment a_3 Treatment b_1 Treatment b_2
## ENSMUSG00000028180       427       473           397           318
## ENSMUSG00000028182         9         6             6             4
## ENSMUSG00000028185          0          0              0              0
## ENSMUSG00000028184       814       771          110          154
## ENSMUSG00000028187       142       173           13           22
## ENSMUSG00000028186          0          0              8              0
##                   Treatment b_3 Treatment b_4 Treatment c_1 Treatment c_2
## ENSMUSG00000028180           35           31       93       44
## ENSMUSG00000028182             38             24         40         42
## ENSMUSG00000028185              0              0          0          0
## ENSMUSG00000028184           926           857      1018      1960
## ENSMUSG00000028187           152           196       179       160
## ENSMUSG00000028186              0              0          4          4
##                   Treatment c_3
## ENSMUSG00000028180       3453
## ENSMUSG00000028182         38
## ENSMUSG00000028185          0
## ENSMUSG00000028184       9514
## ENSMUSG00000028187       1663
## ENSMUSG00000028186          0

  # read in sample info
pheno <- read.csv('pheno_sheet.csv', header = T, row.names = 1L)

  # making sure the row names in colData matches to column names in counts_data
all(colnames(data) %in% rownames(pheno))
## [1] TRUE
  # are they in the same order?
all(colnames(data) == rownames(pheno))
## [1] TRUE


# Step 2: construct a DESeqDataSet object

dds <- DESeqDataSetFromMatrix(countData = data,
                              colData = pheno,
                              design = ~ Sample)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]
dds # n = 55,401
## class: DESeqDataSet 
## dim: 55401 13 
## metadata(1): version
## assays(1): counts
## rownames(55401): ENSMUSG00000028180 ENSMUSG00000028182 ...
##   ENSMUSG00000036958 ENSMUSG00000042678
## rowData names(0):
## colnames(13): Control_1 Control_2 ... Treatment c_2 Treatment c_3
## colData names(1): Sample
  # pre-filtering: removing rows with low gene counts
  # This is recommended step as it helps reduce the size of the dds object and increase speed of computation

  # keeping rows that have at least 10 reads total
keep <- rowSums(counts(dds)) >= 10  # >=10 because recommended in the vignette
dds <- dds[keep,]
dds # n= 24,038
## class: DESeqDataSet 
## dim: 24038 13 
## metadata(1): version
## assays(1): counts
## rownames(24038): ENSMUSG00000028180 ENSMUSG00000028182 ...
##   ENSMUSG00000036959 ENSMUSG00000042678
## rowData names(0):
## colnames(13): Control_1 Control_2 ... Treatment c_2 Treatment c_3
## colData names(1): Sample

# set the factor level
dds$Sample<- relevel(dds$Sample, ref = "Control") # If the reference level is not mentioned, then it will automatically choose the reference alphabetically
dds$Sample
##  [1] Control Control Control Treatment a     Treatment a     Treatment a     Treatment b Treatment b Treatment b
## [10] Treatment b Treatment c     Treatment c     Treatment c    
## Levels: Control Treatment a Treatment b Treatment c


# Step 3: Run DESeq
dds <- DESeq(dds)
## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]
## final dispersion estimates
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]
## fitting model and testing

res_Treatment a <- results(dds, contrast = c("Sample", "Treatment a", "Control"))
res_Treatment a
## log2 fold change (MLE): Sample Treatment a vs Control 
## Wald test p-value: Sample Treatment a vs Control 
## DataFrame with 24038 rows and 6 columns
##                       baseMean log2FoldChange     lfcSE       stat      pvalue
##                      <numeric>      <numeric> <numeric>  <numeric>   <numeric>
## ENSMUSG00000028180  3862.6757     -0.057747 0.070335 -0.756516  0.4286236
## ENSMUSG00000028182    41.9097     -0.017546 0.488878 -0.035134  0.9708700
## ENSMUSG00000028184  934.53415      0.213321 0.04522  2.638894  0.0081781
## ENSMUSG00000028187  163.88391      0.178966 0.095555  1.526086  0.1618069
## ENSMUSG00000028186     1.4051      3.8020443 3.816178  0.961816  0.3116193
## ...                        ...            ...       ...        ...         ...
## ENSMUSG00000106108    0.8605       0.000000 4.620361   0.00000 1.00000e+00
## ENSMUSG00000042675 5026.25958      -0.33392 0.069803  -5.128677 2.9785e-07
## ENSMUSG00000083767   50.466359       0.1324 0.435256   0.9007 7.3404e-01
## ENSMUSG00000036959 126.95391      -0.55607 0.126976  -4.27439 2.6366e-05
## ENSMUSG00000042678    3.27304       5.45275 3.965736   1.75693 1.8917e-01
##                           padj
##                      <numeric>
## ENSMUSG00000028180   0.647043
## ENSMUSG00000028182   0.985538
## ENSMUSG00000028184   0.03958
## ENSMUSG00000028187   0.3520263
## ENSMUSG00000028186          NA
## ...                        ...
## ENSMUSG00000106108          NA
## ENSMUSG00000042675 8.26780-06
## ENSMUSG00000083767 8.5413e-01
## ENSMUSG00000036959 3.4132e-04
## ENSMUSG00000042678          NA

summary(res_Treatment a)
## 
## out of 24038 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 2002, 8.3%
## LFC < 0 (down)     : 2421, 10%
## outliers [1]       : 17, 0.071%
## low counts [2]     : 9315, 39%
## (mean count < 25)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results


# QUESTIONS:

# I SET OTHER CONTRASTS FOR COMPARISIONS AT THE SAME STEP WHERE I GENERATED Treatment a VS CONTROL?
# I use these contrasts to generate MA plots, differentially expressed genes 
and other data visualizations individually for each contrast?

DESeq2 • 3.8k views

ADD COMMENT • link updated 19 months ago by Michael Love 41k • written 2.1 years ago by Manav • 0

score 1 · Answer 1 · 2022-03-18

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 4 hours ago

United States

I SET OTHER CONTRASTS FOR COMPARISIONS AT THE SAME STEP WHERE I GENERATED Treatment a VS CONTROL?

Yes

I use these contrasts to generate MA plots, differentially expressed genes and other data visualizations individually for each contrast?

Yes

ADD COMMENT • link 2.1 years ago Michael Love 41k

0

Entering edit mode

Hi Michael, Thank you for the input.

ADD REPLY • link 2.1 years ago Manav • 0

0

Entering edit mode

Dear Michael Love . Some doubts on this question

1) Correct that as long as the contrasts are specified there's no need to set the factor level

#### so this step he has done is not needed 
dds$Sample<- relevel(dds$Sample, ref = "Control")

2) If we were talking about multiple treatments and multiple time points, the way to go would be the same ?

e.g. sampleTable

sample condition
sample1 Time_1_Treatment_a
sample2 Time_1_Treatment_a
sample3 Time_1_Treatment_a
sample4 Control_Time_1
sample5 Control_Time_1
sample6 Control_Time_1
sample7 Time_2_Treatment_a
sample8 Time_2_Treatment_a
sample9 Time_2_Treatment_a
sample10 Control_Time_2
sample11 Control_Time_2
sample12 Control_Time_2

ddsMat <- DESeqDataSetFromMatrix(countData = txi, colData = sampleTable, design = ~condition)
dds <- DESeq(ddsMat)
res_Time_1_Treatment a <- results(dds, contrast = c("condition", "Time_1_Treatment_a", "Control_Time_1"), alpha = 0.05)
res_Time_2_Treatment a <- results(dds, contrast = c("condition", "Time_2_Treatment_a", "Control_Time_2"), alpha = 0.05)

3) is that important to filter genes like in the step described below? In the vignette it's written that this step can be skipped before running Deseq functions.

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

Thanks in advance

ADD REPLY • link 20 months ago lessismore ▴ 20

1

Entering edit mode

1)

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#note-on-factor-levels

2)

This is one approach.

3)

It can be skipped but there's no harm in removing low count genes, we usually use something like:

keep <- rowSums(counts(dds) >= 10) >= X

where X is the smallest group size in the experiment.

ADD REPLY • link 19 months ago Michael Love 41k