DESeq2 design with unbalance data and a multifactor design
Entering edit mode • 0
Last seen 3 days ago

Dear all,

I am doing a DGEA with DESeq2 and data imported with tximport. I have an unbalanced dataset as reported below. With the counts and the metadata that I have I would like to answer different questions.

  1. I would like to look at the differential expressed genes between the different Lines (e.g. Line "C" vs. Line "15" and all the combinations) while controlling for differences in Person_ID and Location
  2. I would like to look at the differential expressed genes between the different Locations (e.g. Location "A" vs. Location "R" and all the other combinations) while controlling for differences in Person_ID and Line

  3. I would like to look at the differential expressed genes between two different Lines in the same Location, e.g. considering Location A: which is the difference between Line 15 and Line 20 ?

  4. I would like to look at the differential expressed genes between two different Locations in the same Line, e.g. considering Line 20: which is the difference between Location A and Location D ?

Here is the code that I use to build the DESeq2 object:



files <-file.path(dir,samples,"t_data.ctab")
names(files)<-substr(str_split_i(files, "/", 6), 1, 10)

tmp <- read.csv(files[1], sep="\t")
tx2gene <- tmp[, c("t_name", "gene_name")]

txi<-tximport::tximport(files, type = "stringtie", tx2gene = tx2gene)

samples_meta<-read.csv(file="/path/metadata.csv", sep="," , header = TRUE)

txi$counts <- txi$counts[, rownames(samples_meta)]
dds <-DESeqDataSetFromTximport(txi, colData = samples_meta,  design = ~ Line)    # which design do I have to use?? 

dds$Line <- relevel(dds$Line, ref = "C") # do I need a reference level? 
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

Metadata of the object:

     Sample_ID Person_ID Line Location
4  SA18082157    KK1451   15        A
7  SA18083382   KK1473N   15        S
12 SA18083387   KK1450N   15        R
14 SA18083360   KK1480N   15        S
19 SA18083365   KK1551N   15        R
25 SA18083368   KK1471N   15        A
27 SA18086317   KK1443N   15        D
32 SA18051387   KK1868N   15        D
38 SA18051384   KK1865N   15        S
41 SA18051386   KK1601N   15        A
18 SA18083364   KK1551N   18        R
23 SA18083366   KK1471N   18        A
33 SA18088686   KK1671N   18        A
36 SA18088660   KK1434N   18        R
2  SA18082155    KK1451   20        A
6  SA18083381   KK1473N   20        S
10 SA18083386   KK1450N   20        R
16 SA18083344   KK1480N   20        S
17 SA18083363   KK1551N   20        R
24 SA18082300   KK1471N   20        A
26 SA18086318   KK1443N   20        D
29 SA18086315   KK1374N   20        D
31 SA18051386   KK1868N   20        D
34 SA18086313   KK1671N   20        A
37 SA18051383   KK1865N   20        S
40 SA18051388   KK1601N   20        A
3  SA18083341    KK1451    C        A
8  SA18083346   KK1473N    C        S
11 SA18083343   KK1450N    C        R
15 SA18083345   KK1480N    C        S
20 SA18083342   KK1551N    C        R
1  SA18082156    KK1451    I        A
5  SA18083383   KK1473N    I        S
13 SA18083386   KK1480N    I        S
22 SA18083367   KK1471N    I        A
30 SA18086316   KK1374N    I        D
35 SA18086318   KK1671N    I        A
39 SA18051385   KK1865N    I        S
42 SA18051360   KK1601N    I        A
9  SA18083385   KK1450N    M        R
21 SA18082301   KK1471N    M        A
28 SA18088662   KK1443N    M        D

Which is the best design to answer all these questions? Do I need different designs for all the questions?

Once the design is set, how can I extract the right contrast?

sessionInfo( )

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/ 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_1.0.0               stringr_1.5.0               dplyr_1.1.3                 purrr_1.0.2                
 [5] readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1                tidyverse_1.3.1            
 [9] ggplot2_3.4.4               DESeq2_1.40.2               SummarizedExperiment_1.30.2 Biobase_2.60.0             
[13] MatrixGenerics_1.12.3       matrixStats_1.0.0           GenomicRanges_1.52.1        GenomeInfoDb_1.36.4        
[17] IRanges_2.34.1              S4Vectors_0.38.2            BiocGenerics_0.46.0         tximport_1.28.0            

loaded via a namespace (and not attached):
 [1] gtable_0.3.4            lattice_0.22-5          tzdb_0.4.0              vctrs_0.6.4             tools_4.3.1            
 [6] bitops_1.0-7            generics_0.1.3          parallel_4.3.1          fansi_1.0.5             pkgconfig_2.0.3        
[11] Matrix_1.6-1.1          dbplyr_2.3.4            readxl_1.4.3            lifecycle_1.0.3         GenomeInfoDbData_1.2.10
[16] compiler_4.3.1          munsell_0.5.0           codetools_0.2-19        RCurl_1.98-1.12         pillar_1.9.0           
[21] crayon_1.5.2            BiocParallel_1.34.2     DelayedArray_0.26.7     abind_1.4-5             rvest_1.0.3            
[26] tidyselect_1.2.0        locfit_1.5-9.8          stringi_1.7.12          grid_4.3.1              colorspace_2.1-0       
[31] cli_3.6.1               magrittr_2.0.3          S4Arrays_1.0.6          utf8_1.2.3              broom_1.0.5            
[36] withr_2.5.1             scales_1.2.1            backports_1.4.1         timechange_0.2.0        lubridate_1.9.3        
[41] modelr_0.1.11           XVector_0.40.0          httr_1.4.7              cellranger_1.1.0        hms_1.1.3              
[46] haven_2.5.3             rlang_1.1.1             Rcpp_1.0.11             glue_1.6.2              DBI_1.1.3              
[51] xml2_1.3.5              reprex_2.0.2            rstudioapi_0.15.0       jsonlite_1.8.7          R6_2.5.1               
[56] fs_1.6.3                zlibbioc_1.46.0
DESeq2 • 140 views
Entering edit mode
ATpoint ★ 3.6k
Last seen 1 day ago

Please post over at The DESeq2 developer has stated here many times that the support site is for technical issues and bugs with the packages, not for consultation of experimental designs. For this you might collaborate with local folks familiar with linear models.


Login before adding your answer.

Traffic: 614 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6