DEXSeq Error in estimateExonFoldChanges: the matrix is either rank-deficient or indefinite
1
0
Entering edit mode
aditi ▴ 20
@aditi-9925
Last seen 5.7 years ago
Indian Institute of Science,Bangalore, …

Hi,

I am facing trouble with running the DEXSeq package. I have generated the counts files using the counts.py script and the files seem to be ok. I have 6 samples, 3 samples each for one condition and therefor two conditions. I want to study the differential exon expression and usage between controls and treated(mptp). The sampleTable is thus:

> sampleTable
           condition libType
C1namesort   control  paired
C2namesort   control  paired
C3namesort   control  paired
T1namesort      mptp  paired
T2namesort      mptp  paired
T3namesort      mptp  paired

My counts files and the flattened files seem to be alright:

> countFiles
[1] "/path /to/file/C1namesortfb.txt"
[2] "/path /to/file/C2namesortfb.txt"
[3] "/path /to/file/C3namesortfb.txt"
[4] "/path /to/file/T1namesortfb.txt"
[5] "/path /to/file/T2namesortfb.txt"
[6] "/path /to/file/T3namesortfb.txt"

> flattenedFile

[1] "/media/aditi/rnaseq/RNA_seq_analysis14MPTPSN/namesortedcountfiles/mouserefgenes.DEXSeq.chr.gff"

I created the DEXSeqDataSet thus:

> dxd = DEXSeqDataSetFromHTSeq(
+ countFiles,
+ sampleData = sampleTable,
+ design = ~sample + exon + condition:exon,
+ flattenedfile = flattenedFile)

> dxd = estimateSizeFactors (dxd)
> dxd = estimateDispersions (dxd)

>  dxd = testForDEU (dxd)

All these functions run without throwing back any error. However, estimateexonFoldChanges gives 18 warning:

> dxd = estimateExonFoldChanges (dxd, fitExpToVar="condition")
There were 18 warnings (use warnings() to see them)

Warning messages:
1: In chol.default(XVX + lambda * I, pivot = TRUE) :
  the matrix is either rank-deficient or indefinite

This is repeated 18 times

What am I doing wrong? I am at a loss!! I'll be very grateful for any suggestions/help. Thanks in advance!

The sessionInfo is thus:

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_IN       LC_NUMERIC=C         LC_TIME=en_IN       
 [4] LC_COLLATE=en_IN     LC_MONETARY=en_IN    LC_MESSAGES=en_IN   
 [7] LC_PAPER=en_IN       LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] DEXSeq_1.16.10             DESeq2_1.10.1             
 [3] RcppArmadillo_0.6.500.4.0  Rcpp_0.12.3               
 [5] SummarizedExperiment_1.0.2 GenomicRanges_1.22.4      
 [7] GenomeInfoDb_1.6.3         IRanges_2.4.8             
 [9] S4Vectors_0.8.11           Biobase_2.30.0            
[11] BiocGenerics_0.16.1        BiocParallel_1.4.3        

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3          
 [4] XVector_0.10.0       bitops_1.0-6         tools_3.2.3         
 [7] futile.options_1.0.0 zlibbioc_1.16.0      biomaRt_2.26.1      
[10] statmod_1.4.24       rpart_4.1-10         RSQLite_1.0.0       
[13] annotate_1.48.0      gtable_0.2.0         lattice_0.20-33     
[16] DBI_0.3.1            gridExtra_2.2.1      stringr_1.0.0       
[19] hwriter_1.3.2        genefilter_1.52.1    cluster_2.0.3       
[22] Biostrings_2.38.4    locfit_1.5-9.1       grid_3.2.3          
[25] nnet_7.3-12          AnnotationDbi_1.32.3 XML_3.98-1.1        
[28] survival_2.38-3      foreign_0.8-66       latticeExtra_0.6-28
[31] Formula_1.2-1        magrittr_1.5         geneplotter_1.48.0  
[34] ggplot2_2.0.0        lambda.r_1.1.7       Rsamtools_1.22.0    
[37] Hmisc_3.17-2         scales_0.4.0         splines_3.2.3       
[40] colorspace_1.2-6     xtable_1.8-2         stringi_1.0-1       
[43] acepack_1.3-3.3      RCurl_1.95-4.3       munsell_0.4.3

dexseq estimateexonfoldchanges • 3.7k views
ADD COMMENT
0
Entering edit mode

HI Aditi, 

Since this are warnings, your code stills runs and it is likely to produce results. Do you have a fold change columns filled in the mcols field or are they all NAs?

Alejandro

ADD REPLY
0
Entering edit mode

Thanks for a very prompt reply!

I get all NAs in the fold change column:

> dxr1

LRT p-value: full vs reduced

DataFrame with 360776 rows and 13 columns
                                   groupID   featureID exonBaseMean
                               <character> <character>    <numeric>
ENSMUSG00000000001:E001 ENSMUSG00000000001        E001    436.44916
ENSMUSG00000000001:E002 ENSMUSG00000000001        E002     80.69931
ENSMUSG00000000001:E003 ENSMUSG00000000001        E003     79.29886
ENSMUSG00000000001:E004 ENSMUSG00000000001        E004     73.38072
ENSMUSG00000000001:E005 ENSMUSG00000000001        E005     58.09525
...                                    ...         ...          ...
ENSMUSG00000099330:E001 ENSMUSG00000099330        E001            0
ENSMUSG00000099331:E001 ENSMUSG00000099331        E001            0
ENSMUSG00000099332:E001 ENSMUSG00000099332        E001            0
ENSMUSG00000099333:E001 ENSMUSG00000099333        E001            0
ENSMUSG00000099334:E001 ENSMUSG00000099334        E001            0
                          dispersion         stat    pvalue      padj   control
                           <numeric>    <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000000001:E001 0.0053922670 0.0005567544 0.9811751         1  34.52678
ENSMUSG00000000001:E002 0.0009620218 2.0208808149 0.1551491         1  16.99218
ENSMUSG00000000001:E003 0.0013217651 0.4837283457 0.4867384         1  16.57670
ENSMUSG00000000001:E004 0.0016435700 0.0036748478 0.9516614         1  15.66852
ENSMUSG00000000001:E005 0.0013398025 0.4872866382 0.4851402         1  13.77229
...                              ...          ...       ...       ...       ...
ENSMUSG00000099330:E001           NA           NA        NA        NA        NA
ENSMUSG00000099331:E001           NA           NA        NA        NA        NA
ENSMUSG00000099332:E001           NA           NA        NA        NA        NA
ENSMUSG00000099333:E001           NA           NA        NA        NA        NA
ENSMUSG00000099334:E001           NA           NA        NA        NA        NA
                             mptp log2fold_mptp_control
                        <numeric>             <numeric>
ENSMUSG00000000001:E001  34.50975         -0.0007119172
ENSMUSG00000000001:E002  15.97748         -0.0888307964
ENSMUSG00000000001:E003  16.07271         -0.0445431508
ENSMUSG00000000001:E004  15.72571          0.0052561206
ENSMUSG00000000001:E005  14.30014          0.0542604215
...                           ...                   ...
ENSMUSG00000099330:E001        NA                    NA
ENSMUSG00000099331:E001        NA                    NA
ENSMUSG00000099332:E001        NA                    NA
ENSMUSG00000099333:E001        NA                    NA
ENSMUSG00000099334:E001        NA                    NA
                                                genomicData       countData
                                                  <GRanges>        <matrix>
ENSMUSG00000000001:E001          3:-:[108107280, 108109316] 388 396 484 ...
ENSMUSG00000000001:E002          3:-:[108109403, 108109612]    66 87 98 ...
ENSMUSG00000000001:E003          3:-:[108111935, 108112088]    64 88 86 ...
ENSMUSG00000000001:E004          3:-:[108112473, 108112602]    68 72 71 ...
ENSMUSG00000000001:E005          3:-:[108115763, 108115891]    54 52 55 ...
...                                                     ...             ...
ENSMUSG00000099330:E001            7:-:[44538106, 44538178]       0 0 0 ...
ENSMUSG00000099331:E001           11:+:[46170087, 46170143]       0 0 0 ...
ENSMUSG00000099332:E001           17:+:[85618067, 85618265]       0 0 0 ...
ENSMUSG00000099333:E001 MG3833_PATCH:+:[18355616, 18355763]       0 0 0 ...
ENSMUSG00000099334:E001            4:+:[74635214, 74635351]       0 0 0 ...
                        transcripts
                             <list>
ENSMUSG00000000001:E001    ########
ENSMUSG00000000001:E002    ########
ENSMUSG00000000001:E003    ########
ENSMUSG00000000001:E004    ########
ENSMUSG00000000001:E005    ########
...                             ...
ENSMUSG00000099330:E001    ########
ENSMUSG00000099331:E001    ########
ENSMUSG00000099332:E001    ########
ENSMUSG00000099333:E001    ########
ENSMUSG00000099334:E001    ########

 

ADD REPLY
0
Entering edit mode

Hi aditi, 

To me it seems that you are not getting NAs for all the exons, look for example at the exons from the first gene (ENSMUSG00000000001). Still, it would be nice to trace where the warnings are being originated. What is the output of:

x <- is.na(mcols(dxd)$log2fold_untreated_treated) & !is.na(DEXSeqResults(dxd)$padj)
rownames(dxd)[which(x)]


Alejandro 

ADD REPLY
0
Entering edit mode

You are right, Alejandro! I was a bit too hasty in saying that they are all NAs. It would be perfect to get rid of the warnings!

the output for the command (x <- is.na(mcols(dxd)$log2fold_mptp_control) & !is.na(DEXSeqResults(dxd)$padj)

(I'm printing only the last few entries here but all I see is a series of "FALSE"s)

>x

[99913] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99925] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99937] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99949] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99961] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99973] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99985] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99997] FALSE FALSE FALSE
 [ reached getOption("max.print") -- omitted 260777 entries ]

 

ADD REPLY
0
Entering edit mode

> rownames(dxd)[which(x)]
character(0)

ADD REPLY
0
Entering edit mode

Sorry! I guess I didn't run the command right! I'm a complete novice in this field. The output is a series of ensembl ids:

[99990] "ENSMUSG00000027397:E014"                                                                                               
[99991] "ENSMUSG00000027397:E015"                                                                                               
[99992] "ENSMUSG00000027397:E016"                                                                                               
[99993] "ENSMUSG00000027397:E017"                                                                                               
[99994] "ENSMUSG00000027397:E019"                                                                                               
[99995] "ENSMUSG00000027397:E020"                                                                                               
[99996] "ENSMUSG00000027397:E021"                                                                                               
[99997] "ENSMUSG00000027397:E022"                                                                                               
[99998] "ENSMUSG00000027397:E023"                                                                                               
[99999] "ENSMUSG00000027397:E024"                                                                                               
 [ reached getOption("max.print") -- omitted 180515 entries ]


These are the last few lines. Have I done it right this time? Also examining my DEXSeqResults reports, I find that I have the fold change values for 280515 genes but NAs for some 80401 genes. Is it something expected out of DEXSeqResults?

Thanks!

ADD REPLY
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 7 months ago
Novartis Institutes for BioMedical Rese…

Hi again Aditi, 

Thanks for reporting this. As mentioned earlier, it was a warning and not an error. This warnings came from genes were the glm fit fails and the coefficients are not estimated. This should not affect your overall analysis. In the dataset that I tested, it was a single gene generating all the warnings. I added a more informative warning message in the development version of DEXSeq.

Alejandro

ADD COMMENT

Login before adding your answer.

Traffic: 778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6