Dear all,
I am doing a DGEA with DESeq2 and data imported with tximport. I have an unbalanced dataset as reported below. With the counts and the metadata that I have I would like to answer different questions.
- I would like to look at the differential expressed genes between the different Lines (e.g. Line "C" vs. Line "15" and all the combinations) while controlling for differences in Person_ID and Location
- I would like to look at the differential expressed genes between the different Locations (e.g. Location "A" vs. Location "R" and all the other combinations) while controlling for differences in Person_ID and Line 
- I would like to look at the differential expressed genes between two different Lines in the same Location, e.g. considering Location A: which is the difference between Line 15 and Line 20 ? 
- I would like to look at the differential expressed genes between two different Locations in the same Line, e.g. considering Line 20: which is the difference between Location A and Location D ?
Here is the code that I use to build the DESeq2 object:
dir<-"path/quantif"
setwd(dir)
samples<-list.files(dir)
files <-file.path(dir,samples,"t_data.ctab")
names(files)<-substr(str_split_i(files, "/", 6), 1, 10)
all(file.exists(files))
tmp <- read.csv(files[1], sep="\t")
head(tmp)
tx2gene <- tmp[, c("t_name", "gene_name")]
head(tx2gene)
txi<-tximport::tximport(files, type = "stringtie", tx2gene = tx2gene)
head(txi$counts)
samples_meta<-read.csv(file="/path/metadata.csv", sep="," , header = TRUE)
txi$counts <- txi$counts[, rownames(samples_meta)]
dds <-DESeqDataSetFromTximport(txi, colData = samples_meta,  design = ~ Line)    # which design do I have to use?? 
dds$Line <- relevel(dds$Line, ref = "C") # do I need a reference level? 
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds<-DESeq(dds)
Metadata of the object:
     Sample_ID Person_ID Line Location
4  SA18082157    KK1451   15        A
7  SA18083382   KK1473N   15        S
12 SA18083387   KK1450N   15        R
14 SA18083360   KK1480N   15        S
19 SA18083365   KK1551N   15        R
25 SA18083368   KK1471N   15        A
27 SA18086317   KK1443N   15        D
32 SA18051387   KK1868N   15        D
38 SA18051384   KK1865N   15        S
41 SA18051386   KK1601N   15        A
18 SA18083364   KK1551N   18        R
23 SA18083366   KK1471N   18        A
33 SA18088686   KK1671N   18        A
36 SA18088660   KK1434N   18        R
2  SA18082155    KK1451   20        A
6  SA18083381   KK1473N   20        S
10 SA18083386   KK1450N   20        R
16 SA18083344   KK1480N   20        S
17 SA18083363   KK1551N   20        R
24 SA18082300   KK1471N   20        A
26 SA18086318   KK1443N   20        D
29 SA18086315   KK1374N   20        D
31 SA18051386   KK1868N   20        D
34 SA18086313   KK1671N   20        A
37 SA18051383   KK1865N   20        S
40 SA18051388   KK1601N   20        A
3  SA18083341    KK1451    C        A
8  SA18083346   KK1473N    C        S
11 SA18083343   KK1450N    C        R
15 SA18083345   KK1480N    C        S
20 SA18083342   KK1551N    C        R
1  SA18082156    KK1451    I        A
5  SA18083383   KK1473N    I        S
13 SA18083386   KK1480N    I        S
22 SA18083367   KK1471N    I        A
30 SA18086316   KK1374N    I        D
35 SA18086318   KK1671N    I        A
39 SA18051385   KK1865N    I        S
42 SA18051360   KK1601N    I        A
9  SA18083385   KK1450N    M        R
21 SA18082301   KK1471N    M        A
28 SA18088662   KK1443N    M        D
Which is the best design to answer all these questions? Do I need different designs for all the questions?
Once the design is set, how can I extract the right contrast?
sessionInfo( )
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
time zone: Europe/Berlin
tzcode source: system (glibc)
attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] forcats_1.0.0               stringr_1.5.0               dplyr_1.1.3                 purrr_1.0.2                
 [5] readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1                tidyverse_1.3.1            
 [9] ggplot2_3.4.4               DESeq2_1.40.2               SummarizedExperiment_1.30.2 Biobase_2.60.0             
[13] MatrixGenerics_1.12.3       matrixStats_1.0.0           GenomicRanges_1.52.1        GenomeInfoDb_1.36.4        
[17] IRanges_2.34.1              S4Vectors_0.38.2            BiocGenerics_0.46.0         tximport_1.28.0            
loaded via a namespace (and not attached):
 [1] gtable_0.3.4            lattice_0.22-5          tzdb_0.4.0              vctrs_0.6.4             tools_4.3.1            
 [6] bitops_1.0-7            generics_0.1.3          parallel_4.3.1          fansi_1.0.5             pkgconfig_2.0.3        
[11] Matrix_1.6-1.1          dbplyr_2.3.4            readxl_1.4.3            lifecycle_1.0.3         GenomeInfoDbData_1.2.10
[16] compiler_4.3.1          munsell_0.5.0           codetools_0.2-19        RCurl_1.98-1.12         pillar_1.9.0           
[21] crayon_1.5.2            BiocParallel_1.34.2     DelayedArray_0.26.7     abind_1.4-5             rvest_1.0.3            
[26] tidyselect_1.2.0        locfit_1.5-9.8          stringi_1.7.12          grid_4.3.1              colorspace_2.1-0       
[31] cli_3.6.1               magrittr_2.0.3          S4Arrays_1.0.6          utf8_1.2.3              broom_1.0.5            
[36] withr_2.5.1             scales_1.2.1            backports_1.4.1         timechange_0.2.0        lubridate_1.9.3        
[41] modelr_0.1.11           XVector_0.40.0          httr_1.4.7              cellranger_1.1.0        hms_1.1.3              
[46] haven_2.5.3             rlang_1.1.1             Rcpp_1.0.11             glue_1.6.2              DBI_1.1.3              
[51] xml2_1.3.5              reprex_2.0.2            rstudioapi_0.15.0       jsonlite_1.8.7          R6_2.5.1               
[56] fs_1.6.3                zlibbioc_1.46.0
