Dear Bioconductor community,
I have time-course Nanostring data and despite having looked at the vignette and several examples I can't figure out how to make the proper design or if I can use Deseq2 at all for my problem. I have 2 groups: Con and Bio, that had blood drawn in 4-time points (0h, 0.5h, 3h, 8h). The problem here is that the Con group has only 2 subjects, whereas the Bio has 3. So every time I try to make the matrix design I get "Not a Full Rank". Looking at the vignette, this is happening due to the difference in samples. I have tried to follow the "Levels without samples" in the vignette, but still can't figure out the design. Is there a way to run this in Deseq2?
countData= read.table("Counts.txt", header=T,row.names=1)
colData= read.table ("Groups.txt", header=T, row.names = 1, stringsAsFactors = TRUE)
as.data.frame(colData)
                        Groups Time Ind Ind.n
X20201013_30102498530821.01_B1_01     bio   0h   1     1
X20201013_30102498530821.01_B10_04    bio   8h   1     1
X20201013_30102498530821.01_B4_02     bio 0.5h   1     1
X20201013_30102498530821.01_B7_03     bio   3h   1     1
X20201013_30102498530821.01_C3_07     con   0h   4     2
X20201013_30102498530821.01_C5_08     con 0.5h   4     2
X20201013_30102498530821.01_C7_09     con   3h   4     2
X20201013_30102498530821.01_C9_10     con   8h   4     2
X20201014_30102498540821.01_B11_04    bio   8h   2     1
X20201014_30102498540821.01_B2_01     bio   0h   2     1
X20201014_30102498540821.01_B5_02     bio 0.5h   2     1
X20201014_30102498540821.01_B8_03     bio   3h   2     1
X20201014_30102498540821.01_C10_10    con   8h   5     2
X20201014_30102498540821.01_C4_07     con   0h   5     2
X20201014_30102498540821.01_C6_08     con 0.5h   5     2
X20201014_30102498540821.01_C8_09     con   3h   5     2
X20201016_30102498550821.01_B12_04    bio   8h   3     1
X20201016_30102498550821.01_B3_01     bio   0h   3     1
X20201016_30102498550821.01_B6_02     bio 0.5h   3     1
X20201016_30102498550821.01_B9_03     bio   3h   3     1
m1<-model.matrix(~Groups + Time + Groups:Ind.n + Groups:Time, colData)
colnames(m1)
unname(m1)
all.zero <- apply(m1, 2, function(x) all(x==0))
all.zero
idx <- which(all.zero)
m1 <- m1[,-idx]
unname(m1)
m0<-model.matrix(~ Groups + Time + Groups:Ind.n, colData)
dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData, design= m1)
dds <- DESeq(dds, full= m1, reduced= m0, test="LRT")
> sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] DESeq2_1.28.1               SummarizedExperiment_1.18.2 DelayedArray_0.14.1         matrixStats_0.57.0         
 [5] Biobase_2.48.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.2         IRanges_2.22.2             
 [9] S4Vectors_0.26.1            BiocGenerics_0.34.0        
loaded via a namespace (and not attached):
 [1] genefilter_1.70.0      tidyselect_1.1.0       locfit_1.5-9.4         purrr_0.3.4            splines_4.0.2         
 [6] lattice_0.20-41        colorspace_1.4-1       vctrs_0.3.4            generics_0.1.0         blob_1.2.1            
[11] survival_3.2-7         XML_3.99-0.5           rlang_0.4.7            pillar_1.4.6           glue_1.4.2            
[16] DBI_1.1.0              BiocParallel_1.22.0    bit64_4.0.5            RColorBrewer_1.1-2     GenomeInfoDbData_1.2.3
[21] lifecycle_0.2.0        zlibbioc_1.34.0        munsell_0.5.0          gtable_0.3.0           memoise_1.1.0         
[26] geneplotter_1.66.0     AnnotationDbi_1.50.3   Rcpp_1.0.5             xtable_1.8-4           scales_1.1.1          
[31] annotate_1.66.0        XVector_0.28.0         bit_4.0.4              ggplot2_3.3.2          digest_0.6.25         
[36] dplyr_1.0.2            grid_4.0.2             tools_4.0.2            bitops_1.0-6           magrittr_1.5          
[41] RCurl_1.98-1.2         RSQLite_2.2.0          tibble_3.0.3           crayon_1.3.4           pkgconfig_2.0.3       
[46] ellipsis_0.3.1         Matrix_1.2-18          rstudioapi_0.11        R6_2.5.0               compiler_4.0.2
                    
                
                
Thank you, Michael, but I wasn't clear. Sorry about that. Doing the procedure described above I still get the "Not Full Rank" message. I don't know what I'm doing wrong, could you help me to clarify that?
Looks like there are two errors. First, if I understand what you are trying to do, your model matrix is wrong, because your Ind.n column is wrong. It should probably look like this:
And it's still 'wrong' inasmuch as the reference level is 0.5 hours rather than 0 hours, but I leave that to the OP to fix.
Thank you so much, James. I see what you mean about the Ind.n in the matrix and that was exactly my problem. I appreciate it.
I’d suggest to collaborate with a statistician on choosing the right design and construction of the model matrix that can be estimated with your data.