Question: DEXSeq Error in estimateExonFoldChanges: the matrix is either rank-deficient or indefinite
0
gravatar for aditi
3.8 years ago by
aditi20
Indian Institute of Science,Bangalore, India
aditi20 wrote:

Hi,

I am facing trouble with running the DEXSeq package. I have generated the counts files using the counts.py script and the files seem to be ok. I have 6 samples, 3 samples each for one condition and therefor two conditions. I want to study the differential exon expression and usage between controls and treated(mptp). The sampleTable is thus:

> sampleTable
           condition libType
C1namesort   control  paired
C2namesort   control  paired
C3namesort   control  paired
T1namesort      mptp  paired
T2namesort      mptp  paired
T3namesort      mptp  paired

My counts files and the flattened files seem to be alright:

> countFiles
[1] "/path /to/file/C1namesortfb.txt"
[2] "/path /to/file/C2namesortfb.txt"
[3] "/path /to/file/C3namesortfb.txt"
[4] "/path /to/file/T1namesortfb.txt"
[5] "/path /to/file/T2namesortfb.txt"
[6] "/path /to/file/T3namesortfb.txt"

> flattenedFile

[1] "/media/aditi/rnaseq/RNA_seq_analysis14MPTPSN/namesortedcountfiles/mouserefgenes.DEXSeq.chr.gff"

I created the DEXSeqDataSet thus:

> dxd = DEXSeqDataSetFromHTSeq(
+ countFiles,
+ sampleData = sampleTable,
+ design = ~sample + exon + condition:exon,
+ flattenedfile = flattenedFile)

> dxd = estimateSizeFactors (dxd)
> dxd = estimateDispersions (dxd)

>  dxd = testForDEU (dxd)

All these functions run without throwing back any error. However, estimateexonFoldChanges gives 18 warning:

> dxd = estimateExonFoldChanges (dxd, fitExpToVar="condition")
There were 18 warnings (use warnings() to see them)

Warning messages:
1: In chol.default(XVX + lambda * I, pivot = TRUE) :
  the matrix is either rank-deficient or indefinite

This is repeated 18 times

What am I doing wrong? I am at a loss!! I'll be very grateful for any suggestions/help. Thanks in advance!

The sessionInfo is thus:

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_IN       LC_NUMERIC=C         LC_TIME=en_IN       
 [4] LC_COLLATE=en_IN     LC_MONETARY=en_IN    LC_MESSAGES=en_IN   
 [7] LC_PAPER=en_IN       LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] DEXSeq_1.16.10             DESeq2_1.10.1             
 [3] RcppArmadillo_0.6.500.4.0  Rcpp_0.12.3               
 [5] SummarizedExperiment_1.0.2 GenomicRanges_1.22.4      
 [7] GenomeInfoDb_1.6.3         IRanges_2.4.8             
 [9] S4Vectors_0.8.11           Biobase_2.30.0            
[11] BiocGenerics_0.16.1        BiocParallel_1.4.3        

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3          
 [4] XVector_0.10.0       bitops_1.0-6         tools_3.2.3         
 [7] futile.options_1.0.0 zlibbioc_1.16.0      biomaRt_2.26.1      
[10] statmod_1.4.24       rpart_4.1-10         RSQLite_1.0.0       
[13] annotate_1.48.0      gtable_0.2.0         lattice_0.20-33     
[16] DBI_0.3.1            gridExtra_2.2.1      stringr_1.0.0       
[19] hwriter_1.3.2        genefilter_1.52.1    cluster_2.0.3       
[22] Biostrings_2.38.4    locfit_1.5-9.1       grid_3.2.3          
[25] nnet_7.3-12          AnnotationDbi_1.32.3 XML_3.98-1.1        
[28] survival_2.38-3      foreign_0.8-66       latticeExtra_0.6-28
[31] Formula_1.2-1        magrittr_1.5         geneplotter_1.48.0  
[34] ggplot2_2.0.0        lambda.r_1.1.7       Rsamtools_1.22.0    
[37] Hmisc_3.17-2         scales_0.4.0         splines_3.2.3       
[40] colorspace_1.2-6     xtable_1.8-2         stringi_1.0-1       
[43] acepack_1.3-3.3      RCurl_1.95-4.3       munsell_0.4.3

ADD COMMENTlink modified 3.7 years ago by Alejandro Reyes1.7k • written 3.8 years ago by aditi20

HI Aditi, 

Since this are warnings, your code stills runs and it is likely to produce results. Do you have a fold change columns filled in the mcols field or are they all NAs?

Alejandro

ADD REPLYlink written 3.8 years ago by Alejandro Reyes1.7k

Thanks for a very prompt reply!

I get all NAs in the fold change column:

> dxr1

LRT p-value: full vs reduced

DataFrame with 360776 rows and 13 columns
                                   groupID   featureID exonBaseMean
                               <character> <character>    <numeric>
ENSMUSG00000000001:E001 ENSMUSG00000000001        E001    436.44916
ENSMUSG00000000001:E002 ENSMUSG00000000001        E002     80.69931
ENSMUSG00000000001:E003 ENSMUSG00000000001        E003     79.29886
ENSMUSG00000000001:E004 ENSMUSG00000000001        E004     73.38072
ENSMUSG00000000001:E005 ENSMUSG00000000001        E005     58.09525
...                                    ...         ...          ...
ENSMUSG00000099330:E001 ENSMUSG00000099330        E001            0
ENSMUSG00000099331:E001 ENSMUSG00000099331        E001            0
ENSMUSG00000099332:E001 ENSMUSG00000099332        E001            0
ENSMUSG00000099333:E001 ENSMUSG00000099333        E001            0
ENSMUSG00000099334:E001 ENSMUSG00000099334        E001            0
                          dispersion         stat    pvalue      padj   control
                           <numeric>    <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000000001:E001 0.0053922670 0.0005567544 0.9811751         1  34.52678
ENSMUSG00000000001:E002 0.0009620218 2.0208808149 0.1551491         1  16.99218
ENSMUSG00000000001:E003 0.0013217651 0.4837283457 0.4867384         1  16.57670
ENSMUSG00000000001:E004 0.0016435700 0.0036748478 0.9516614         1  15.66852
ENSMUSG00000000001:E005 0.0013398025 0.4872866382 0.4851402         1  13.77229
...                              ...          ...       ...       ...       ...
ENSMUSG00000099330:E001           NA           NA        NA        NA        NA
ENSMUSG00000099331:E001           NA           NA        NA        NA        NA
ENSMUSG00000099332:E001           NA           NA        NA        NA        NA
ENSMUSG00000099333:E001           NA           NA        NA        NA        NA
ENSMUSG00000099334:E001           NA           NA        NA        NA        NA
                             mptp log2fold_mptp_control
                        <numeric>             <numeric>
ENSMUSG00000000001:E001  34.50975         -0.0007119172
ENSMUSG00000000001:E002  15.97748         -0.0888307964
ENSMUSG00000000001:E003  16.07271         -0.0445431508
ENSMUSG00000000001:E004  15.72571          0.0052561206
ENSMUSG00000000001:E005  14.30014          0.0542604215
...                           ...                   ...
ENSMUSG00000099330:E001        NA                    NA
ENSMUSG00000099331:E001        NA                    NA
ENSMUSG00000099332:E001        NA                    NA
ENSMUSG00000099333:E001        NA                    NA
ENSMUSG00000099334:E001        NA                    NA
                                                genomicData       countData
                                                  <GRanges>        <matrix>
ENSMUSG00000000001:E001          3:-:[108107280, 108109316] 388 396 484 ...
ENSMUSG00000000001:E002          3:-:[108109403, 108109612]    66 87 98 ...
ENSMUSG00000000001:E003          3:-:[108111935, 108112088]    64 88 86 ...
ENSMUSG00000000001:E004          3:-:[108112473, 108112602]    68 72 71 ...
ENSMUSG00000000001:E005          3:-:[108115763, 108115891]    54 52 55 ...
...                                                     ...             ...
ENSMUSG00000099330:E001            7:-:[44538106, 44538178]       0 0 0 ...
ENSMUSG00000099331:E001           11:+:[46170087, 46170143]       0 0 0 ...
ENSMUSG00000099332:E001           17:+:[85618067, 85618265]       0 0 0 ...
ENSMUSG00000099333:E001 MG3833_PATCH:+:[18355616, 18355763]       0 0 0 ...
ENSMUSG00000099334:E001            4:+:[74635214, 74635351]       0 0 0 ...
                        transcripts
                             <list>
ENSMUSG00000000001:E001    ########
ENSMUSG00000000001:E002    ########
ENSMUSG00000000001:E003    ########
ENSMUSG00000000001:E004    ########
ENSMUSG00000000001:E005    ########
...                             ...
ENSMUSG00000099330:E001    ########
ENSMUSG00000099331:E001    ########
ENSMUSG00000099332:E001    ########
ENSMUSG00000099333:E001    ########
ENSMUSG00000099334:E001    ########

 

ADD REPLYlink written 3.8 years ago by aditi20

Hi aditi, 

To me it seems that you are not getting NAs for all the exons, look for example at the exons from the first gene (ENSMUSG00000000001). Still, it would be nice to trace where the warnings are being originated. What is the output of:

x <- is.na(mcols(dxd)$log2fold_untreated_treated) & !is.na(DEXSeqResults(dxd)$padj)
rownames(dxd)[which(x)]


Alejandro 

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Alejandro Reyes1.7k

You are right, Alejandro! I was a bit too hasty in saying that they are all NAs. It would be perfect to get rid of the warnings!

the output for the command (x <- is.na(mcols(dxd)$log2fold_mptp_control) & !is.na(DEXSeqResults(dxd)$padj)

(I'm printing only the last few entries here but all I see is a series of "FALSE"s)

>x

[99913] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99925] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99937] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99949] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99961] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99973] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99985] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99997] FALSE FALSE FALSE
 [ reached getOption("max.print") -- omitted 260777 entries ]

 

ADD REPLYlink written 3.8 years ago by aditi20

> rownames(dxd)[which(x)]
character(0)

ADD REPLYlink written 3.8 years ago by aditi20

Sorry! I guess I didn't run the command right! I'm a complete novice in this field. The output is a series of ensembl ids:

[99990] "ENSMUSG00000027397:E014"                                                                                               
[99991] "ENSMUSG00000027397:E015"                                                                                               
[99992] "ENSMUSG00000027397:E016"                                                                                               
[99993] "ENSMUSG00000027397:E017"                                                                                               
[99994] "ENSMUSG00000027397:E019"                                                                                               
[99995] "ENSMUSG00000027397:E020"                                                                                               
[99996] "ENSMUSG00000027397:E021"                                                                                               
[99997] "ENSMUSG00000027397:E022"                                                                                               
[99998] "ENSMUSG00000027397:E023"                                                                                               
[99999] "ENSMUSG00000027397:E024"                                                                                               
 [ reached getOption("max.print") -- omitted 180515 entries ]


These are the last few lines. Have I done it right this time? Also examining my DEXSeqResults reports, I find that I have the fold change values for 280515 genes but NAs for some 80401 genes. Is it something expected out of DEXSeqResults?

Thanks!

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by aditi20
Answer: DEXSeq Error in estimateExonFoldChanges: the matrix is either rank-deficient or
0
gravatar for Alejandro Reyes
3.7 years ago by
Alejandro Reyes1.7k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.7k wrote:

Hi again Aditi, 

Thanks for reporting this. As mentioned earlier, it was a warning and not an error. This warnings came from genes were the glm fit fails and the coefficients are not estimated. This should not affect your overall analysis. In the dataset that I tested, it was a single gene generating all the warnings. I added a more informative warning message in the development version of DEXSeq.

Alejandro

ADD COMMENTlink written 3.7 years ago by Alejandro Reyes1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 381 users visited in the last hour