Another Matrix is not full rank question, multi-factor design issues
1
0
Entering edit mode
pw349 • 0
@pw349-18323
Last seen 6.0 years ago

Hello,

Thank you in advance for your help with this. I have been trying very hard to work this out and have tried to follow the vignette and looked through Q&As on this site, but to no avail.

My experimental outline: I pretreated cells to either "control" or "RecLG" for 3 days and then on the 4th day treated them to control or low glucose conditions. This gave me four groups "C", "LG", "aRLG", and "RLG".

I am interested in differential gene expression between control and RecLG pretreatments, and then the differences between each of the treatment groups. my coldata looks like this:

as.data.frame(coldata2)
       pretreatment treatment treatment.n
AH_1          RecLG      aRLG           1
AH_2          RecLG      aRLG           1
AH_4          RecLG      aRLG           1
AH_5          RecLG      aRLG           1
AH_6          RecLG      aRLG           1
CONT_1      control         C           2
CONT_3      control         C           2
CONT_4      control         C           2
CONT_5      control         C           2
CONT_6      control         C           2
HYPO_1      control        LG           1
HYPO_2      control        LG           1
HYPO_3      control        LG           1
HYPO_5      control        LG           1
HYPO_6      control        LG           1
RH_1          RecLG       RLG           2
RH_2          RecLG       RLG           2
RH_3          RecLG       RLG           2
RH_5          RecLG       RLG           2
RH_6          RecLG       RLG           2

In each group there are 5 replicates. As you can see I made a treatment.n column to account for the nesting of "aRLG" and "RLG" within the "RecLG" pretreatment group and "C" and "LG" within the "control pretreatment group.

When I run:

dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata2, design = ~pretreatment + pretreatment:treatment.n + treatment)

I get the error "Error in checkFullRank(modelMatrix)"

For the life of me, I can not figure this out.

The model matrix looks like this:

 model.matrix(~pretreatment + pretreatment:treatment.n + treatment, coldata2)
       (Intercept) pretreatmentRecLG treatmentC treatmentLG treatmentRLG pretreatmentcontrol:treatment.n2
AH_1             1                 1          0           0            0                                0
AH_2             1                 1          0           0            0                                0
AH_4             1                 1          0           0            0                                0
AH_5             1                 1          0           0            0                                0
AH_6             1                 1          0           0            0                                0
CONT_1           1                 0          1           0            0                                1
CONT_3           1                 0          1           0            0                                1
CONT_4           1                 0          1           0            0                                1
CONT_5           1                 0          1           0            0                                1
CONT_6           1                 0          1           0            0                                1
HYPO_1           1                 0          0           1            0                                0
HYPO_2           1                 0          0           1            0                                0
HYPO_3           1                 0          0           1            0                                0
HYPO_5           1                 0          0           1            0                                0
HYPO_6           1                 0          0           1            0                                0
RH_1             1                 1          0           0            1                                0
RH_2             1                 1          0           0            1                                0
RH_3             1                 1          0           0            1                                0
RH_5             1                 1          0           0            1                                0
RH_6             1                 1          0           0            1                                0
       pretreatmentRecLG:treatment.n2
AH_1                                0
AH_2                                0
AH_4                                0
AH_5                                0
AH_6                                0
CONT_1                              0
CONT_3                              0
CONT_4                              0
CONT_5                              0
CONT_6                              0
HYPO_1                              0
HYPO_2                              0
HYPO_3                              0
HYPO_5                              0
HYPO_6                              0
RH_1                                1
RH_2                                1
RH_3                                1
RH_5                                1
RH_6                                1
attr(,"assign")
[1] 0 1 2 2 2 3 3
attr(,"contrasts")
attr(,"contrasts")$pretreatment
[1] "contr.treatment"

attr(,"contrasts")$treatment.n
[1] "contr.treatment"

attr(,"contrasts")$treatment
[1] "contr.treatment"

I'm just really stuck with this, I don't know where to start to sort this problem out so I've come here for help. Anything is really appreciated.

Many thanks,

Paul

deseq2 model matrix not full rank multiple factor design • 1.2k views
ADD COMMENT
0
Entering edit mode

FYI, session info is below:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2         pheatmap_1.0.10            ggplot2_3.0.0             
 [4] dplyr_0.7.6                genefilter_1.58.1          RUVSeq_1.10.0             
 [7] edgeR_3.18.1               limma_3.32.10              EDASeq_2.10.0             
[10] ShortRead_1.34.2           GenomicAlignments_1.12.2   Rsamtools_1.28.0          
[13] Biostrings_2.44.2          XVector_0.16.0             BiocParallel_1.10.1       
[16] DESeq2_1.16.1              SummarizedExperiment_1.6.5 DelayedArray_0.2.7        
[19] matrixStats_0.54.0         Biobase_2.38.0             GenomicRanges_1.28.6      
[22] GenomeInfoDb_1.12.3        IRanges_2.10.5             S4Vectors_0.14.7          
[25] BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] bitops_1.0-6            bit64_0.9-7             tools_3.4.3             backports_1.1.2        
 [5] R6_2.2.2                rpart_4.1-13            Hmisc_4.1-1             DBI_1.0.0              
 [9] lazyeval_0.2.1          colorspace_1.3-2        nnet_7.3-12             withr_2.1.2            
[13] tidyselect_0.2.4        gridExtra_2.3           bit_1.1-14              compiler_3.4.3         
[17] htmlTable_1.12          rtracklayer_1.36.6      scales_1.0.0            checkmate_1.8.5        
[21] DESeq_1.28.0            stringr_1.3.1           digest_0.6.17           foreign_0.8-71         
[25] R.utils_2.7.0           base64enc_0.1-3         pkgconfig_2.0.2         htmltools_0.3.6        
[29] htmlwidgets_1.3         rlang_0.2.2             rstudioapi_0.7          RSQLite_2.1.1          
[33] bindr_0.1.1             hwriter_1.3.2           acepack_1.4.1           R.oo_1.22.0            
[37] RCurl_1.95-4.11         magrittr_1.5            GenomeInfoDbData_0.99.0 Formula_1.2-3          
[41] Matrix_1.2-14           Rcpp_0.12.18            munsell_0.5.0           R.methodsS3_1.7.1      
[45] stringi_1.2.4           yaml_2.2.0              MASS_7.3-50             zlibbioc_1.22.0        
[49] plyr_1.8.4              grid_3.4.3              blob_1.1.1              crayon_1.3.4           
[53] lattice_0.20-35         splines_3.4.3           GenomicFeatures_1.28.5  annotate_1.54.0        
[57] locfit_1.5-9.1          knitr_1.20              pillar_1.3.0            geneplotter_1.56.0     
[61] biomaRt_2.32.1          XML_3.98-1.16           glue_1.3.0              latticeExtra_0.6-28    
[65] data.table_1.11.4       gtable_0.2.0            purrr_0.2.5             assertthat_0.2.0       
[69] aroma.light_3.6.0       xtable_1.8-3            survival_2.42-6         tibble_1.4.2           
[73] AnnotationDbi_1.40.0    memoise_1.1.0           bindrcpp_0.2.2          cluster_2.0.7-1 
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

There is no nesting here. Nesting means there is something inherently different between the two groups. Like if you had two different cell types. But if these are all the same cell types, you have four groups. You can paste the first two columns of your coldata, separated by a period and use that as your factor, so you have four groups, RecLG.aRLG, control.c, control.LG, RecLG.RLG, and then make whatever comparisons you care about.

 

ADD COMMENT
0
Entering edit mode

Thanks James for the feedback. Could I just run everything using just the treatment column then? i.e

dds2 <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~treatment)
dds2 <- DESeq(dds2)
rld_dds2 <- rlog(dds2, blind = FALSE)
results <-DESeq(dds2)
DESeq2_results=results(results)

And then contrast between groups? i.e

results_CvLG <- results(dds2, contrast=c("treatment","C","LG"))
head(results_CvLG)
log2 fold change (MLE): treatment C vs LG 
Wald test p-value: treatment C vs LG 
DataFrame with 6 rows and 6 columns
                      baseMean log2FoldChange      lfcSE        stat       pvalue       padj
                     <numeric>      <numeric>  <numeric>   <numeric>    <numeric>  <numeric>
ENSG00000227232.5     4.078592     -0.0209465 0.89695220 -0.02335297 0.9813687199         NA
ENSG00000279457.4     3.638559     -0.6458330 0.85428877 -0.75598916 0.4496556937         NA
ENSG00000228463.10    1.594748      2.0704358 1.41400168  1.46423856 0.1431287784         NA
ENSG00000225972.1     1.773500      0.3730648 1.18547461  0.31469661 0.7529920184         NA
ENSG00000225630.1  1314.483271     -0.3273191 0.09114708 -3.59110859 0.0003292744 0.07313782
ENSG00000237973.1   693.793225     -0.3296993 0.11795461 -2.79513748 0.0051877649 0.22739225
ADD REPLY
0
Entering edit mode

Hi, Can I just bump this again to see if I understood your answer correctly, please?

Thank you James

ADD REPLY
0
Entering edit mode

Yes that is how you would make comparisons between levels of a single factor.

ADD REPLY
0
Entering edit mode

Thanks James and Michael!

ADD REPLY

Login before adding your answer.

Traffic: 437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6