Time Series: Individuals nested within groups and levels without samples
1
0
Entering edit mode
ijvechetti ▴ 10
@ijvechetti-20701
Last seen 16 months ago
United States

Dear Bioconductor community,

I have time-course Nanostring data and despite having looked at the vignette and several examples I can't figure out how to make the proper design or if I can use Deseq2 at all for my problem. I have 2 groups: Con and Bio, that had blood drawn in 4-time points (0h, 0.5h, 3h, 8h). The problem here is that the Con group has only 2 subjects, whereas the Bio has 3. So every time I try to make the matrix design I get "Not a Full Rank". Looking at the vignette, this is happening due to the difference in samples. I have tried to follow the "Levels without samples" in the vignette, but still can't figure out the design. Is there a way to run this in Deseq2?

countData= read.table("Counts.txt", header=T,row.names=1)
colData= read.table ("Groups.txt", header=T, row.names = 1, stringsAsFactors = TRUE)
as.data.frame(colData)

                        Groups Time Ind Ind.n
X20201013_30102498530821.01_B1_01     bio   0h   1     1
X20201013_30102498530821.01_B10_04    bio   8h   1     1
X20201013_30102498530821.01_B4_02     bio 0.5h   1     1
X20201013_30102498530821.01_B7_03     bio   3h   1     1
X20201013_30102498530821.01_C3_07     con   0h   4     2
X20201013_30102498530821.01_C5_08     con 0.5h   4     2
X20201013_30102498530821.01_C7_09     con   3h   4     2
X20201013_30102498530821.01_C9_10     con   8h   4     2
X20201014_30102498540821.01_B11_04    bio   8h   2     1
X20201014_30102498540821.01_B2_01     bio   0h   2     1
X20201014_30102498540821.01_B5_02     bio 0.5h   2     1
X20201014_30102498540821.01_B8_03     bio   3h   2     1
X20201014_30102498540821.01_C10_10    con   8h   5     2
X20201014_30102498540821.01_C4_07     con   0h   5     2
X20201014_30102498540821.01_C6_08     con 0.5h   5     2
X20201014_30102498540821.01_C8_09     con   3h   5     2
X20201016_30102498550821.01_B12_04    bio   8h   3     1
X20201016_30102498550821.01_B3_01     bio   0h   3     1
X20201016_30102498550821.01_B6_02     bio 0.5h   3     1
X20201016_30102498550821.01_B9_03     bio   3h   3     1

m1<-model.matrix(~Groups + Time + Groups:Ind.n + Groups:Time, colData)
colnames(m1)
unname(m1)
all.zero <- apply(m1, 2, function(x) all(x==0))
all.zero
idx <- which(all.zero)
m1 <- m1[,-idx]
unname(m1)

m0<-model.matrix(~ Groups + Time + Groups:Ind.n, colData)
dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData, design= m1)
dds <- DESeq(dds, full= m1, reduced= m0, test="LRT")

> sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.28.1               SummarizedExperiment_1.18.2 DelayedArray_0.14.1         matrixStats_0.57.0         
 [5] Biobase_2.48.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.2         IRanges_2.22.2             
 [9] S4Vectors_0.26.1            BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
 [1] genefilter_1.70.0      tidyselect_1.1.0       locfit_1.5-9.4         purrr_0.3.4            splines_4.0.2         
 [6] lattice_0.20-41        colorspace_1.4-1       vctrs_0.3.4            generics_0.1.0         blob_1.2.1            
[11] survival_3.2-7         XML_3.99-0.5           rlang_0.4.7            pillar_1.4.6           glue_1.4.2            
[16] DBI_1.1.0              BiocParallel_1.22.0    bit64_4.0.5            RColorBrewer_1.1-2     GenomeInfoDbData_1.2.3
[21] lifecycle_0.2.0        zlibbioc_1.34.0        munsell_0.5.0          gtable_0.3.0           memoise_1.1.0         
[26] geneplotter_1.66.0     AnnotationDbi_1.50.3   Rcpp_1.0.5             xtable_1.8-4           scales_1.1.1          
[31] annotate_1.66.0        XVector_0.28.0         bit_4.0.4              ggplot2_3.3.2          digest_0.6.25         
[36] dplyr_1.0.2            grid_4.0.2             tools_4.0.2            bitops_1.0-6           magrittr_1.5          
[41] RCurl_1.98-1.2         RSQLite_2.2.0          tibble_3.0.3           crayon_1.3.4           pkgconfig_2.0.3       
[46] ellipsis_0.3.1         Matrix_1.2-18          rstudioapi_0.11        R6_2.5.0               compiler_4.0.2
DESeq2 • 712 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

"I have tried to follow the "Levels without samples" in the vignette, but still can't figure out the design."

Determining the statistical design is up to the analyst and you may want to consult with a statistician to best design and interpret the results.

It looks like you've followed the recommendation in the vignette.

ADD COMMENT
0
Entering edit mode

Thank you, Michael, but I wasn't clear. Sorry about that. Doing the procedure described above I still get the "Not Full Rank" message. I don't know what I'm doing wrong, could you help me to clarify that?

ADD REPLY
1
Entering edit mode

Looks like there are two errors. First, if I understand what you are trying to do, your model matrix is wrong, because your Ind.n column is wrong. It should probably look like this:

> d.f <- data.frame(Groups = factor(rep(c("bio","con"), each = 4, times = 3))[1:20], 
                    Time = factor(rep(c("0h","8h","0.5h","3h"), times = 5)), 
                    Ind.n = factor(rep(1:3, each = 8))[1:20])
> d.f
   Groups Time Ind.n
1     bio   0h     1
2     bio   8h     1
3     bio 0.5h     1
4     bio   3h     1
5     con   0h     1
6     con   8h     1
7     con 0.5h     1
8     con   3h     1
9     bio   0h     2
10    bio   8h     2
11    bio 0.5h     2
12    bio   3h     2
13    con   0h     2
14    con   8h     2
15    con 0.5h     2
16    con   3h     2
17    bio   0h     3
18    bio   8h     3
19    bio 0.5h     3
20    bio   3h     3

## see how there is a 1 and a 2 for the con subjects, and a 1, 2, and a 3 for bio?
> m1 <- model.matrix(~Groups + Time + Groups:Ind.n + Groups:Time, d.f)
## but still wrong
> library(limma)
> is.fullrank(m1)
[1] FALSE
## this is because there is a column in the design matrix that's all zeros, which is no bueno
> m1 <- m1[,colSums(m1) > 0]
> is.fullrank(m1)
[1] TRUE
ADD REPLY
0
Entering edit mode

And it's still 'wrong' inasmuch as the reference level is 0.5 hours rather than 0 hours, but I leave that to the OP to fix.

ADD REPLY
0
Entering edit mode

Thank you so much, James. I see what you mean about the Ind.n in the matrix and that was exactly my problem. I appreciate it.

ADD REPLY
0
Entering edit mode

I’d suggest to collaborate with a statistician on choosing the right design and construction of the model matrix that can be estimated with your data.

ADD REPLY

Login before adding your answer.

Traffic: 534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6