Accounting for technical replicates in DESeq2
2
0
Entering edit mode
sara.p • 0
@3b827d37
Last seen 9 months ago
Sweden

Hi,

I'm a fairly new DESeq2 user and was hoping to get some help with how to correctly account for technical replicates in my dataset. Here is some example data:

suppressPackageStartupMessages(library("DESeq2"))

coldata <- data.frame(
  'group' = factor(c(rep('group1', 4), rep('group2', 4))),
  'sample' = factor(c(1, 1, 2, 2, 3, 3, 4, 4)),
  'tech_rep' = factor(rep(c('rep1', 'rep2'), 4)), 
  row.names = paste0(rep('samp', 8), 1:8)
)

coldata
       group sample tech_rep
samp1 group1      1     rep1
samp2 group1      1     rep2
samp3 group1      2     rep1
samp4 group1      2     rep2
samp5 group2      3     rep1
samp6 group2      3     rep2
samp7 group2      4     rep1
samp8 group2      4     rep2

cts <- data.frame(matrix(round(runif(n=8*50, min=0, max=20)), nrow=50))
colnames(cts) <- rownames(coldata)

We are interested in DEGs between groups 1 and 2, and we would like to account for the fact that each sample has two technical replicates. Initially, I thought the right way to do this was:

dds <- DESeqDataSetFromMatrix(
  countData = cts,
  colData = coldata,
  design = ~tech_rep + group
)

which does not result in any error message, but then I realized that this design might suggest that all the "rep1" are from the same batch, and all "rep2" from another batch, which isn't true. I instead tried

dds <- DESeqDataSetFromMatrix(
  countData = cts,
  colData = coldata,
  design = ~sample + tech_rep + group
)

converting counts to integer mode
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

but this doesn't work. I have read the vignette section "Group-specific condition effects, individuals nested within groups" and tried to apply it to my own data.

#make a simple object in which to replace design matrix later
dds <- DESeqDataSetFromMatrix(
  countData = cts,
  colData = coldata,
  design = ~group
)

dds$samp.n <- factor(rep(rep(1:2,each=2),2))

dds@design <- ~group + group:samp.n + group:tech_rep

dds <- DESeq(dds)
results(dds)

log2 fold change (MLE): groupgroup2.tech reprep2 
Wald test p-value: groupgroup2.tech reprep2 
DataFrame with 50 rows and 6 columns
     baseMean log2FoldChange     lfcSE      stat     pvalue      padj
    <numeric>      <numeric> <numeric> <numeric>  <numeric> <numeric>
1     8.64649       0.114671   1.01386  0.113103 0.90994904  0.968031
2    11.35877      -0.612314   0.98719 -0.620259 0.53508701  0.968031
3     9.09778       0.184718   0.98820  0.186923 0.85172066  0.968031
4    11.67832      -0.497979   0.95027 -0.524040 0.60025100  0.968031
5     7.12506      -5.162355   1.95858 -2.635770 0.00839465  0.419610
...       ...            ...       ...       ...        ...       ...
46    5.66848       0.657364   1.65819  0.396434   0.691785  0.968031
47   10.78412       0.627660   1.18219  0.530930   0.595467  0.968031
48    6.25440       0.870858   1.62538  0.535787   0.592106  0.968031
49   10.59046       0.369202   1.10770  0.333305   0.738904  0.968031
50    6.28831       1.329147   1.50087  0.885587   0.375840  0.942738

Is this correct? When I try it on my actual data, I don't seem to get any padj < 0.05, whereas I get a number of genes if I simply ignore the technical replicates and treat all samples as biological replicates.

I would be very grateful for any help with this.

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 13.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.36.0               SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1       
 [5] matrixStats_0.62.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.4         IRanges_2.30.1             
 [9] S4Vectors_0.34.0            BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] KEGGREST_1.36.3        genefilter_1.78.0      locfit_1.5-9.6         tidyselect_1.2.0       splines_4.2.0         
 [6] lattice_0.20-45        generics_0.1.3         colorspace_2.0-3       vctrs_0.5.1            utf8_1.2.2            
[11] blob_1.2.3             XML_3.99-0.11          survival_3.3-1         rlang_1.0.6            pillar_1.8.1          
[16] glue_1.6.2             DBI_1.1.3              BiocParallel_1.30.4    bit64_4.0.5            RColorBrewer_1.1-3    
[21] GenomeInfoDbData_1.2.8 lifecycle_1.0.3        zlibbioc_1.42.0        Biostrings_2.64.1      munsell_0.5.0         
[26] gtable_0.3.1           codetools_0.2-18       memoise_2.0.1          geneplotter_1.74.0     fastmap_1.1.0         
[31] parallel_4.2.0         fansi_1.0.3            AnnotationDbi_1.58.0   Rcpp_1.0.9             xtable_1.8-4          
[36] scales_1.2.1           cachem_1.0.6           DelayedArray_0.22.0    annotate_1.74.0        XVector_0.36.0        
[41] bit_4.0.4              ggplot2_3.4.0          png_0.1-7              dplyr_1.0.10           grid_4.2.0            
[46] cli_3.4.1              tools_4.2.0            bitops_1.0-7           magrittr_2.0.3         RCurl_1.98-1.9        
[51] RSQLite_2.2.18         tibble_3.1.8           pkgconfig_2.0.3        crayon_1.5.2           Matrix_1.5-1          
[56] assertthat_0.2.1       httr_1.4.4             rstudioapi_0.14        R6_2.5.1               compiler_4.2.0   
DESeq2 • 493 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

Technical replicates (additional sequencing of a prepared library) are typically well approximated by a Poisson, and so we add them with a dedicated function collapseReplicates.

ADD COMMENT
0
Entering edit mode
swbarnes2 ★ 1.3k
@swbarnes2-14086
Last seen 16 hours ago
San Diego

if I simply ignore the technical replicates and treat all samples as biological replicates.

But they aren't biological replicates.

To emphasize, you shouldn't have technical replicates in your setup, the counts must be combined at some point. DESeq2 can do it if you haven't done it at an earlier stage, like catting fastqs or merging bams or adding gene counts.

You have 4 samples. A design of "Group" all you can do.

ADD COMMENT

Login before adding your answer.

Traffic: 758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6