Question

DEXSeq Error in estimateExonFoldChanges: the matrix is either rank-deficient or indefinite

0

Entering edit mode

aditi ▴ 20

@aditi-9925

Last seen 6.7 years ago

Indian Institute of Science,Bangalore, …

Hi,

I am facing trouble with running the DEXSeq package. I have generated the counts files using the counts.py script and the files seem to be ok. I have 6 samples, 3 samples each for one condition and therefor two conditions. I want to study the differential exon expression and usage between controls and treated(mptp). The sampleTable is thus:

> sampleTable
           condition libType
C1namesort   control paired
C2namesort   control paired
C3namesort   control paired
T1namesort      mptp paired
T2namesort      mptp paired
T3namesort      mptp paired

My counts files and the flattened files seem to be alright:

> countFiles
[1] "/path /to/file/C1namesortfb.txt"
[2] "/path /to/file/C2namesortfb.txt"
[3] "/path /to/file/C3namesortfb.txt"
[4] "/path /to/file/T1namesortfb.txt"
[5] "/path /to/file/T2namesortfb.txt"
[6] "/path /to/file/T3namesortfb.txt"

> flattenedFile

[1] "/media/aditi/rnaseq/RNA_seq_analysis14MPTPSN/namesortedcountfiles/mouserefgenes.DEXSeq.chr.gff"

I created the DEXSeqDataSet thus:

> dxd = DEXSeqDataSetFromHTSeq(
+ countFiles,
+ sampleData = sampleTable,
+ design = ~sample + exon + condition:exon,
+ flattenedfile = flattenedFile)

> dxd = estimateSizeFactors (dxd)
> dxd = estimateDispersions (dxd)

> dxd = testForDEU (dxd)

All these functions run without throwing back any error. However, estimateexonFoldChanges gives 18 warning:

> dxd = estimateExonFoldChanges (dxd, fitExpToVar="condition")
There were 18 warnings (use warnings() to see them)

Warning messages:
1: In chol.default(XVX + lambda * I, pivot = TRUE) :
the matrix is either rank-deficient or indefinite

This is repeated 18 times

What am I doing wrong? I am at a loss!! I'll be very grateful for any suggestions/help. Thanks in advance!

The sessionInfo is thus:

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
[1] LC_CTYPE=en_IN       LC_NUMERIC=C         LC_TIME=en_IN
[4] LC_COLLATE=en_IN     LC_MONETARY=en_IN    LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN       LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] DEXSeq_1.16.10             DESeq2_1.10.1
[3] RcppArmadillo_0.6.500.4.0 Rcpp_0.12.3
[5] SummarizedExperiment_1.0.2 GenomicRanges_1.22.4
[7] GenomeInfoDb_1.6.3         IRanges_2.4.8
[9] S4Vectors_0.8.11           Biobase_2.30.0
[11] BiocGenerics_0.16.1        BiocParallel_1.4.3

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2   futile.logger_1.4.1 plyr_1.8.3
[4] XVector_0.10.0       bitops_1.0-6         tools_3.2.3
[7] futile.options_1.0.0 zlibbioc_1.16.0      biomaRt_2.26.1
[10] statmod_1.4.24       rpart_4.1-10         RSQLite_1.0.0
[13] annotate_1.48.0      gtable_0.2.0         lattice_0.20-33
[16] DBI_0.3.1            gridExtra_2.2.1      stringr_1.0.0
[19] hwriter_1.3.2        genefilter_1.52.1    cluster_2.0.3
[22] Biostrings_2.38.4    locfit_1.5-9.1       grid_3.2.3
[25] nnet_7.3-12          AnnotationDbi_1.32.3 XML_3.98-1.1
[28] survival_2.38-3      foreign_0.8-66       latticeExtra_0.6-28
[31] Formula_1.2-1        magrittr_1.5         geneplotter_1.48.0
[34] ggplot2_2.0.0        lambda.r_1.1.7       Rsamtools_1.22.0
[37] Hmisc_3.17-2         scales_0.4.0         splines_3.2.3
[40] colorspace_1.2-6     xtable_1.8-2         stringi_1.0-1
[43] acepack_1.3-3.3      RCurl_1.95-4.3       munsell_0.4.3

dexseq estimateexonfoldchanges • 4.4k views

ADD COMMENT • link updated 9.1 years ago by Alejandro Reyes ★ 1.9k • written 9.1 years ago by aditi ▴ 20

0

Entering edit mode

HI Aditi,

Since this are warnings, your code stills runs and it is likely to produce results. Do you have a fold change columns filled in the mcols field or are they all NAs?

Alejandro

ADD REPLY • link 9.1 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Thanks for a very prompt reply!

I get all NAs in the fold change column:

> dxr1

LRT p-value: full vs reduced

DataFrame with 360776 rows and 13 columns
                                   groupID   featureID exonBaseMean
                               <character> <character>    <numeric>
ENSMUSG00000000001:E001 ENSMUSG00000000001        E001    436.44916
ENSMUSG00000000001:E002 ENSMUSG00000000001        E002     80.69931
ENSMUSG00000000001:E003 ENSMUSG00000000001        E003     79.29886
ENSMUSG00000000001:E004 ENSMUSG00000000001        E004     73.38072
ENSMUSG00000000001:E005 ENSMUSG00000000001        E005     58.09525
...                                    ...         ...          ...
ENSMUSG00000099330:E001 ENSMUSG00000099330        E001            0
ENSMUSG00000099331:E001 ENSMUSG00000099331        E001            0
ENSMUSG00000099332:E001 ENSMUSG00000099332        E001            0
ENSMUSG00000099333:E001 ENSMUSG00000099333        E001            0
ENSMUSG00000099334:E001 ENSMUSG00000099334        E001            0
                          dispersion         stat    pvalue      padj   control
                           <numeric>    <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000000001:E001 0.0053922670 0.0005567544 0.9811751         1 34.52678
ENSMUSG00000000001:E002 0.0009620218 2.0208808149 0.1551491         1 16.99218
ENSMUSG00000000001:E003 0.0013217651 0.4837283457 0.4867384         1 16.57670
ENSMUSG00000000001:E004 0.0016435700 0.0036748478 0.9516614         1 15.66852
ENSMUSG00000000001:E005 0.0013398025 0.4872866382 0.4851402         1 13.77229
...                              ...          ...       ...       ...       ...
ENSMUSG00000099330:E001           NA           NA        NA        NA        NA
ENSMUSG00000099331:E001           NA           NA        NA        NA        NA
ENSMUSG00000099332:E001           NA           NA        NA        NA        NA
ENSMUSG00000099333:E001           NA           NA        NA        NA        NA
ENSMUSG00000099334:E001           NA           NA        NA        NA        NA
                             mptp log2fold_mptp_control
                        <numeric>             <numeric>
ENSMUSG00000000001:E001 34.50975         -0.0007119172
ENSMUSG00000000001:E002 15.97748         -0.0888307964
ENSMUSG00000000001:E003 16.07271         -0.0445431508
ENSMUSG00000000001:E004 15.72571          0.0052561206
ENSMUSG00000000001:E005 14.30014          0.0542604215
...                           ...                   ...
ENSMUSG00000099330:E001        NA                    NA
ENSMUSG00000099331:E001        NA                    NA
ENSMUSG00000099332:E001        NA                    NA
ENSMUSG00000099333:E001        NA                    NA
ENSMUSG00000099334:E001        NA                    NA
                                                genomicData       countData
                                                  <GRanges>        <matrix>
ENSMUSG00000000001:E001          3:-:[108107280, 108109316] 388 396 484 ...
ENSMUSG00000000001:E002          3:-:[108109403, 108109612]    66 87 98 ...
ENSMUSG00000000001:E003          3:-:[108111935, 108112088]    64 88 86 ...
ENSMUSG00000000001:E004          3:-:[108112473, 108112602]    68 72 71 ...
ENSMUSG00000000001:E005          3:-:[108115763, 108115891]    54 52 55 ...
...                                                     ...             ...
ENSMUSG00000099330:E001            7:-:[44538106, 44538178]       0 0 0 ...
ENSMUSG00000099331:E001           11:+:[46170087, 46170143]       0 0 0 ...
ENSMUSG00000099332:E001           17:+:[85618067, 85618265]       0 0 0 ...
ENSMUSG00000099333:E001 MG3833_PATCH:+:[18355616, 18355763]       0 0 0 ...
ENSMUSG00000099334:E001            4:+:[74635214, 74635351]       0 0 0 ...
                        transcripts
                             <list>
ENSMUSG00000000001:E001    ########
ENSMUSG00000000001:E002    ########
ENSMUSG00000000001:E003    ########
ENSMUSG00000000001:E004    ########
ENSMUSG00000000001:E005    ########
...                             ...
ENSMUSG00000099330:E001    ########
ENSMUSG00000099331:E001    ########
ENSMUSG00000099332:E001    ########
ENSMUSG00000099333:E001    ########
ENSMUSG00000099334:E001    ########

ADD REPLY • link 9.1 years ago aditi ▴ 20

0

Entering edit mode

Hi aditi,

To me it seems that you are not getting NAs for all the exons, look for example at the exons from the first gene (ENSMUSG00000000001). Still, it would be nice to trace where the warnings are being originated. What is the output of:

x <- is.na(mcols(dxd)$log2fold_untreated_treated) & !is.na(DEXSeqResults(dxd)$padj) rownames(dxd)[which(x)]

Alejandro

ADD REPLY • link 9.1 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

You are right, Alejandro! I was a bit too hasty in saying that they are all NAs. It would be perfect to get rid of the warnings!

the output for the command (x <- is.na(mcols(dxd)$log2fold_mptp_control) & !is.na(DEXSeqResults(dxd)$padj)

(I'm printing only the last few entries here but all I see is a series of "FALSE"s)

>x

[99913] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99925] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99937] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99949] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99961] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99973] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99985] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99997] FALSE FALSE FALSE
[ reached getOption("max.print") -- omitted 260777 entries ]

ADD REPLY • link 9.1 years ago aditi ▴ 20

0

Entering edit mode

> rownames(dxd)[which(x)]
character(0)

ADD REPLY • link 9.1 years ago aditi ▴ 20

0

Entering edit mode

Sorry! I guess I didn't run the command right! I'm a complete novice in this field. The output is a series of ensembl ids:

[99990] "ENSMUSG00000027397:E014"
[99991] "ENSMUSG00000027397:E015"
[99992] "ENSMUSG00000027397:E016"
[99993] "ENSMUSG00000027397:E017"
[99994] "ENSMUSG00000027397:E019"
[99995] "ENSMUSG00000027397:E020"
[99996] "ENSMUSG00000027397:E021"
[99997] "ENSMUSG00000027397:E022"
[99998] "ENSMUSG00000027397:E023"
[99999] "ENSMUSG00000027397:E024"
[ reached getOption("max.print") -- omitted 180515 entries ]

These are the last few lines. Have I done it right this time? Also examining my DEXSeqResults reports, I find that I have the fold change values for 280515 genes but NAs for some 80401 genes. Is it something expected out of DEXSeqResults?

Thanks!

ADD REPLY • link 9.1 years ago aditi ▴ 20

score 0 · Answer 1 · 2016-03-23

Hi again Aditi,

Thanks for reporting this. As mentioned earlier, it was a warning and not an error. This warnings came from genes were the glm fit fails and the coefficients are not estimated. This should not affect your overall analysis. In the dataset that I tested, it was a single gene generating all the warnings. I added a more informative warning message in the development version of DEXSeq.

Alejandro