Question

Time Series: Individuals nested within groups and levels without samples

0

Entering edit mode

ijvechetti ▴ 10

@ijvechetti-20701

Last seen 3.1 years ago

United States

Dear Bioconductor community,

I have time-course Nanostring data and despite having looked at the vignette and several examples I can't figure out how to make the proper design or if I can use Deseq2 at all for my problem. I have 2 groups: Con and Bio, that had blood drawn in 4-time points (0h, 0.5h, 3h, 8h). The problem here is that the Con group has only 2 subjects, whereas the Bio has 3. So every time I try to make the matrix design I get "Not a Full Rank". Looking at the vignette, this is happening due to the difference in samples. I have tried to follow the "Levels without samples" in the vignette, but still can't figure out the design. Is there a way to run this in Deseq2?

countData= read.table("Counts.txt", header=T,row.names=1)
colData= read.table ("Groups.txt", header=T, row.names = 1, stringsAsFactors = TRUE)
as.data.frame(colData)

                        Groups Time Ind Ind.n
X20201013_30102498530821.01_B1_01     bio   0h   1     1
X20201013_30102498530821.01_B10_04    bio   8h   1     1
X20201013_30102498530821.01_B4_02     bio 0.5h   1     1
X20201013_30102498530821.01_B7_03     bio   3h   1     1
X20201013_30102498530821.01_C3_07     con   0h   4     2
X20201013_30102498530821.01_C5_08     con 0.5h   4     2
X20201013_30102498530821.01_C7_09     con   3h   4     2
X20201013_30102498530821.01_C9_10     con   8h   4     2
X20201014_30102498540821.01_B11_04    bio   8h   2     1
X20201014_30102498540821.01_B2_01     bio   0h   2     1
X20201014_30102498540821.01_B5_02     bio 0.5h   2     1
X20201014_30102498540821.01_B8_03     bio   3h   2     1
X20201014_30102498540821.01_C10_10    con   8h   5     2
X20201014_30102498540821.01_C4_07     con   0h   5     2
X20201014_30102498540821.01_C6_08     con 0.5h   5     2
X20201014_30102498540821.01_C8_09     con   3h   5     2
X20201016_30102498550821.01_B12_04    bio   8h   3     1
X20201016_30102498550821.01_B3_01     bio   0h   3     1
X20201016_30102498550821.01_B6_02     bio 0.5h   3     1
X20201016_30102498550821.01_B9_03     bio   3h   3     1

m1<-model.matrix(~Groups + Time + Groups:Ind.n + Groups:Time, colData)
colnames(m1)
unname(m1)
all.zero <- apply(m1, 2, function(x) all(x==0))
all.zero
idx <- which(all.zero)
m1 <- m1[,-idx]
unname(m1)

m0<-model.matrix(~ Groups + Time + Groups:Ind.n, colData)
dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData, design= m1)
dds <- DESeq(dds, full= m1, reduced= m0, test="LRT")

> sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.28.1               SummarizedExperiment_1.18.2 DelayedArray_0.14.1         matrixStats_0.57.0         
 [5] Biobase_2.48.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.2         IRanges_2.22.2             
 [9] S4Vectors_0.26.1            BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
 [1] genefilter_1.70.0      tidyselect_1.1.0       locfit_1.5-9.4         purrr_0.3.4            splines_4.0.2         
 [6] lattice_0.20-41        colorspace_1.4-1       vctrs_0.3.4            generics_0.1.0         blob_1.2.1            
[11] survival_3.2-7         XML_3.99-0.5           rlang_0.4.7            pillar_1.4.6           glue_1.4.2            
[16] DBI_1.1.0              BiocParallel_1.22.0    bit64_4.0.5            RColorBrewer_1.1-2     GenomeInfoDbData_1.2.3
[21] lifecycle_0.2.0        zlibbioc_1.34.0        munsell_0.5.0          gtable_0.3.0           memoise_1.1.0         
[26] geneplotter_1.66.0     AnnotationDbi_1.50.3   Rcpp_1.0.5             xtable_1.8-4           scales_1.1.1          
[31] annotate_1.66.0        XVector_0.28.0         bit_4.0.4              ggplot2_3.3.2          digest_0.6.25         
[36] dplyr_1.0.2            grid_4.0.2             tools_4.0.2            bitops_1.0-6           magrittr_1.5          
[41] RCurl_1.98-1.2         RSQLite_2.2.0          tibble_3.0.3           crayon_1.3.4           pkgconfig_2.0.3       
[46] ellipsis_0.3.1         Matrix_1.2-18          rstudioapi_0.11        R6_2.5.0               compiler_4.0.2

DESeq2 • 1.3k views

ADD COMMENT • link 5.2 years ago ijvechetti ▴ 10

score 0 · Answer 1 · 2020-11-10

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 19 hours ago

United States

"I have tried to follow the "Levels without samples" in the vignette, but still can't figure out the design."

Determining the statistical design is up to the analyst and you may want to consult with a statistician to best design and interpret the results.

It looks like you've followed the recommendation in the vignette.

ADD COMMENT • link 5.2 years ago Michael Love 43k

0

Entering edit mode

Thank you, Michael, but I wasn't clear. Sorry about that. Doing the procedure described above I still get the "Not Full Rank" message. I don't know what I'm doing wrong, could you help me to clarify that?

ADD REPLY • link 5.2 years ago ijvechetti ▴ 10

1

Entering edit mode

Looks like there are two errors. First, if I understand what you are trying to do, your model matrix is wrong, because your Ind.n column is wrong. It should probably look like this:

> d.f <- data.frame(Groups = factor(rep(c("bio","con"), each = 4, times = 3))[1:20], 
                    Time = factor(rep(c("0h","8h","0.5h","3h"), times = 5)), 
                    Ind.n = factor(rep(1:3, each = 8))[1:20])
> d.f
   Groups Time Ind.n
1     bio   0h     1
2     bio   8h     1
3     bio 0.5h     1
4     bio   3h     1
5     con   0h     1
6     con   8h     1
7     con 0.5h     1
8     con   3h     1
9     bio   0h     2
10    bio   8h     2
11    bio 0.5h     2
12    bio   3h     2
13    con   0h     2
14    con   8h     2
15    con 0.5h     2
16    con   3h     2
17    bio   0h     3
18    bio   8h     3
19    bio 0.5h     3
20    bio   3h     3

## see how there is a 1 and a 2 for the con subjects, and a 1, 2, and a 3 for bio?
> m1 <- model.matrix(~Groups + Time + Groups:Ind.n + Groups:Time, d.f)
## but still wrong
> library(limma)
> is.fullrank(m1)
[1] FALSE
## this is because there is a column in the design matrix that's all zeros, which is no bueno
> m1 <- m1[,colSums(m1) > 0]
> is.fullrank(m1)
[1] TRUE

ADD REPLY • link 5.2 years ago James W. MacDonald 68k

0

Entering edit mode

And it's still 'wrong' inasmuch as the reference level is 0.5 hours rather than 0 hours, but I leave that to the OP to fix.

ADD REPLY • link 5.2 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you so much, James. I see what you mean about the Ind.n in the matrix and that was exactly my problem. I appreciate it.

ADD REPLY • link 5.2 years ago ijvechetti ▴ 10

0

Entering edit mode

I’d suggest to collaborate with a statistician on choosing the right design and construction of the model matrix that can be estimated with your data.

ADD REPLY • link 5.2 years ago Michael Love 43k