"full model matrix is less than full rank"
1
0
Entering edit mode
fernandogs97 ▴ 10
@33495ccf
Last seen 2.5 years ago
Spain

Hello, I am performing different analysis of transposable elements deregulation with SQuIRE. SQuIRE performs the differential expression analysis with DESeq2, but it does not allow you to introduce any batch effect. According to this and as I am analyzing samples of many laboratories I have decided to perform the differential expression analysis with DESeq2 outside SQuIRE. This is the code (with the inputs modified by me) that I have extracted of the SQuIRE code. (https://github.com/wyang17/SQuIRE)

source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")
library("DESeq2")
library("BiocParallel")
register(MulticoreParam(4))
args<-commandArgs(TRUE)
print(args)

count_filename<-args[1] #count table
coldata_filename<-args[2] #sample description
strand<-args[3]
outfile <- args[4]
print(args)

#read in file
#count_filename="E:/Dropbox/Miguel/squire_call/squire_miguel_counttable.txt"
#coldata_filename="E:/Dropbox/Miguel/squire_call/squire_miguel_coldata.txt"
cts<- as.matrix(read.delim('LaRocca_locus_gene_TE_counttable.txt',header=TRUE, stringsAsFactors=FALSE,row.names="gene_id"))
coldata<-(read.delim('original',header=TRUE, stringsAsFactors=FALSE,row.names="sample"))

dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)
dds$condition <- factor(dds$condition, levels = c("young","old"))
dds <- DESeq(dds)
res <- results(dds)
resLFC <- lfcShrink(dds, coef=2)
resOrdered <- res[order(res$pvalue),]
summary(res)

plotCounts(dds, gene=which.min(res$padj), intgroup="condition")
plotMA(res, ylim=c(-2,2))

TE_only<-resOrdered[grepl("\\|",rownames(resOrdered)),]
plotMA(TE_only, ylim=c(-2,2))

sessionInfo( )

This is the error that It reports when I perform the "dds = DESeq(dds) step

dds <- DESeq(dds)
Error in designAndArgChecker(object, betaPrior) : 
  full model matrix is less than full rank

On the other hand, this outputs make me think the error could be in the condition assignment.

> as.data.frame(colData(dds))
           condition
SRR7093854      <NA>
SRR7093864      <NA>
SRR7093871      <NA>
SRR7093872      <NA>
SRR7093873      <NA>
SRR7093919      <NA>
SRR7093809      <NA>
SRR7093874      <NA>
SRR7093875      <NA>
SRR7093881      <NA>
SRR7093949      <NA>
SRR7093951      <NA>

But when I analyze the coldata object it seems to have been processed correctly:

> coldata
           condition
SRR7093854     old
SRR7093864     old
SRR7093871     old
SRR7093872     old
SRR7093873     old
SRR7093919     old
SRR7093809     young
SRR7093874     young
SRR7093875     young
SRR7093881     young
SRR7093949     young
SRR7093951     young

This is the session info

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] DESeq2_1.32.0               SummarizedExperiment_1.22.0
 [3] Biobase_2.52.0              MatrixGenerics_1.4.0       
 [5] matrixStats_0.59.0          GenomicRanges_1.44.0       
 [7] GenomeInfoDb_1.28.1         IRanges_2.26.0             
 [9] S4Vectors_0.30.0            BiocGenerics_0.38.0        
[11] readr_2.0.1                

loaded via a namespace (and not attached):
 [1] httr_1.4.2             vroom_1.5.5           
 [3] bit64_4.0.5            splines_4.1.1         
 [5] assertthat_0.2.1       blob_1.2.1            
 [7] GenomeInfoDbData_1.2.6 yaml_2.2.1            
 [9] pillar_1.6.1           RSQLite_2.2.7         
[11] lattice_0.20-44        glue_1.4.2            
[13] digest_0.6.27          RColorBrewer_1.1-2    
[15] XVector_0.32.0         colorspace_2.0-2      
[17] htmltools_0.5.1.1      Matrix_1.3-4          
[19] XML_3.99-0.6           pkgconfig_2.0.3       
[21] genefilter_1.74.0      zlibbioc_1.38.0       
[23] purrr_0.3.4            xtable_1.8-4          
[25] scales_1.1.1           tzdb_0.1.2            
[27] BiocParallel_1.26.1    tibble_3.1.2          
[29] annotate_1.70.0        KEGGREST_1.32.0       
[31] generics_0.1.0         ggplot2_3.3.5         
[33] ellipsis_0.3.2         withr_2.4.2           
[35] cachem_1.0.5           cli_3.0.0             
[37] survival_3.2-13        magrittr_2.0.1        
[39] crayon_1.4.1           memoise_2.0.0         
[41] evaluate_0.14          fansi_0.5.0           
[43] tools_4.1.1            hms_1.1.0             
[45] lifecycle_1.0.0        munsell_0.5.0         
[47] locfit_1.5-9.4         DelayedArray_0.18.0   
[49] AnnotationDbi_1.54.1   Biostrings_2.60.1     
[51] compiler_4.1.1         rlang_0.4.11          
[53] grid_4.1.1             RCurl_1.98-1.3        
[55] rstudioapi_0.13        bitops_1.0-7          
[57] rmarkdown_2.9          gtable_0.3.0          
[59] DBI_1.1.1              R6_2.5.0              
[61] knitr_1.33             dplyr_1.0.7           
[63] fastmap_1.1.0          bit_4.0.4             
[65] utf8_1.2.1             Rcpp_1.0.7            
[67] vctrs_0.3.8            geneplotter_1.70.0    
[69] png_0.1-7              tidyselect_1.1.1      
[71] xfun_0.24

I would be very grateful to anyone who can help me, I tried to solve this problem in the SQuIRE forum but I did not have an answer so I hope I can get some solution in this one.

SQuIRE deseq2 DESeq2 • 2.2k views
ADD COMMENT
1
Entering edit mode
ATpoint ★ 3.9k
@atpoint-13662
Last seen 6 minutes ago
Germany

My best guess is that young and old are not "young" or "old" but include some hidden whitespaces like " young" or "old ". That would then make the factor conversion go NA.

> data.frame(condition=factor(c("young ", " old"), levels=c("young", "old")))
  condition
1      <NA>
2      <NA>

If everything is NA then the all samples belong to the same NA group and this causes the full rank error.

Just omit this line, DESeq2 will take care of it internally. You can use contrasts (see manual) to specify the direction of the fold change in results(), and be sure that the colData do not include any whitespaces or other hidden characters. By the way bioclite is super old, just library(DESeq2) is enough. BiocManager is used these days to install packages from Bioc.

ADD COMMENT
1
Entering edit mode

Hello ATpoint, thank you very much for your fast and helpful response, finally I corrected the error by changing the separator, instead of using \t with "_" it works better. Thank you very much! Have a nice day!

ADD REPLY

Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6