Hi,
I am facing trouble with running the DEXSeq package. I have generated the counts files using the counts.py script and the files seem to be ok. I have 6 samples, 3 samples each for one condition and therefor two conditions. I want to study the differential exon expression and usage between controls and treated(mptp). The sampleTable is thus:
> sampleTable
condition libType
C1namesort control paired
C2namesort control paired
C3namesort control paired
T1namesort mptp paired
T2namesort mptp paired
T3namesort mptp paired
My counts files and the flattened files seem to be alright:
> countFiles
[1] "/path /to/file/C1namesortfb.txt"
[2] "/path /to/file/C2namesortfb.txt"
[3] "/path /to/file/C3namesortfb.txt"
[4] "/path /to/file/T1namesortfb.txt"
[5] "/path /to/file/T2namesortfb.txt"
[6] "/path /to/file/T3namesortfb.txt"
> flattenedFile
[1] "/media/aditi/rnaseq/RNA_seq_analysis14MPTPSN/namesortedcountfiles/mouserefgenes.DEXSeq.chr.gff"
I created the DEXSeqDataSet thus:
> dxd = DEXSeqDataSetFromHTSeq(
+ countFiles,
+ sampleData = sampleTable,
+ design = ~sample + exon + condition:exon,
+ flattenedfile = flattenedFile)
> dxd = estimateSizeFactors (dxd)
> dxd = estimateDispersions (dxd)
> dxd = testForDEU (dxd)
All these functions run without throwing back any error. However, estimateexonFoldChanges gives 18 warning:
> dxd = estimateExonFoldChanges (dxd, fitExpToVar="condition")
There were 18 warnings (use warnings() to see them)
Warning messages:
1: In chol.default(XVX + lambda * I, pivot = TRUE) :
the matrix is either rank-deficient or indefinite
This is repeated 18 times
What am I doing wrong? I am at a loss!! I'll be very grateful for any suggestions/help. Thanks in advance!
The sessionInfo is thus:
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
locale:
[1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DEXSeq_1.16.10 DESeq2_1.10.1
[3] RcppArmadillo_0.6.500.4.0 Rcpp_0.12.3
[5] SummarizedExperiment_1.0.2 GenomicRanges_1.22.4
[7] GenomeInfoDb_1.6.3 IRanges_2.4.8
[9] S4Vectors_0.8.11 Biobase_2.30.0
[11] BiocGenerics_0.16.1 BiocParallel_1.4.3
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3
[4] XVector_0.10.0 bitops_1.0-6 tools_3.2.3
[7] futile.options_1.0.0 zlibbioc_1.16.0 biomaRt_2.26.1
[10] statmod_1.4.24 rpart_4.1-10 RSQLite_1.0.0
[13] annotate_1.48.0 gtable_0.2.0 lattice_0.20-33
[16] DBI_0.3.1 gridExtra_2.2.1 stringr_1.0.0
[19] hwriter_1.3.2 genefilter_1.52.1 cluster_2.0.3
[22] Biostrings_2.38.4 locfit_1.5-9.1 grid_3.2.3
[25] nnet_7.3-12 AnnotationDbi_1.32.3 XML_3.98-1.1
[28] survival_2.38-3 foreign_0.8-66 latticeExtra_0.6-28
[31] Formula_1.2-1 magrittr_1.5 geneplotter_1.48.0
[34] ggplot2_2.0.0 lambda.r_1.1.7 Rsamtools_1.22.0
[37] Hmisc_3.17-2 scales_0.4.0 splines_3.2.3
[40] colorspace_1.2-6 xtable_1.8-2 stringi_1.0-1
[43] acepack_1.3-3.3 RCurl_1.95-4.3 munsell_0.4.3
HI Aditi,
Since this are warnings, your code stills runs and it is likely to produce results. Do you have a fold change columns filled in the mcols field or are they all NAs?
Alejandro
Thanks for a very prompt reply!
I get all NAs in the fold change column:
> dxr1
LRT p-value: full vs reduced
DataFrame with 360776 rows and 13 columns
groupID featureID exonBaseMean
<character> <character> <numeric>
ENSMUSG00000000001:E001 ENSMUSG00000000001 E001 436.44916
ENSMUSG00000000001:E002 ENSMUSG00000000001 E002 80.69931
ENSMUSG00000000001:E003 ENSMUSG00000000001 E003 79.29886
ENSMUSG00000000001:E004 ENSMUSG00000000001 E004 73.38072
ENSMUSG00000000001:E005 ENSMUSG00000000001 E005 58.09525
... ... ... ...
ENSMUSG00000099330:E001 ENSMUSG00000099330 E001 0
ENSMUSG00000099331:E001 ENSMUSG00000099331 E001 0
ENSMUSG00000099332:E001 ENSMUSG00000099332 E001 0
ENSMUSG00000099333:E001 ENSMUSG00000099333 E001 0
ENSMUSG00000099334:E001 ENSMUSG00000099334 E001 0
dispersion stat pvalue padj control
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000000001:E001 0.0053922670 0.0005567544 0.9811751 1 34.52678
ENSMUSG00000000001:E002 0.0009620218 2.0208808149 0.1551491 1 16.99218
ENSMUSG00000000001:E003 0.0013217651 0.4837283457 0.4867384 1 16.57670
ENSMUSG00000000001:E004 0.0016435700 0.0036748478 0.9516614 1 15.66852
ENSMUSG00000000001:E005 0.0013398025 0.4872866382 0.4851402 1 13.77229
... ... ... ... ... ...
ENSMUSG00000099330:E001 NA NA NA NA NA
ENSMUSG00000099331:E001 NA NA NA NA NA
ENSMUSG00000099332:E001 NA NA NA NA NA
ENSMUSG00000099333:E001 NA NA NA NA NA
ENSMUSG00000099334:E001 NA NA NA NA NA
mptp log2fold_mptp_control
<numeric> <numeric>
ENSMUSG00000000001:E001 34.50975 -0.0007119172
ENSMUSG00000000001:E002 15.97748 -0.0888307964
ENSMUSG00000000001:E003 16.07271 -0.0445431508
ENSMUSG00000000001:E004 15.72571 0.0052561206
ENSMUSG00000000001:E005 14.30014 0.0542604215
... ... ...
ENSMUSG00000099330:E001 NA NA
ENSMUSG00000099331:E001 NA NA
ENSMUSG00000099332:E001 NA NA
ENSMUSG00000099333:E001 NA NA
ENSMUSG00000099334:E001 NA NA
genomicData countData
<GRanges> <matrix>
ENSMUSG00000000001:E001 3:-:[108107280, 108109316] 388 396 484 ...
ENSMUSG00000000001:E002 3:-:[108109403, 108109612] 66 87 98 ...
ENSMUSG00000000001:E003 3:-:[108111935, 108112088] 64 88 86 ...
ENSMUSG00000000001:E004 3:-:[108112473, 108112602] 68 72 71 ...
ENSMUSG00000000001:E005 3:-:[108115763, 108115891] 54 52 55 ...
... ... ...
ENSMUSG00000099330:E001 7:-:[44538106, 44538178] 0 0 0 ...
ENSMUSG00000099331:E001 11:+:[46170087, 46170143] 0 0 0 ...
ENSMUSG00000099332:E001 17:+:[85618067, 85618265] 0 0 0 ...
ENSMUSG00000099333:E001 MG3833_PATCH:+:[18355616, 18355763] 0 0 0 ...
ENSMUSG00000099334:E001 4:+:[74635214, 74635351] 0 0 0 ...
transcripts
<list>
ENSMUSG00000000001:E001 ########
ENSMUSG00000000001:E002 ########
ENSMUSG00000000001:E003 ########
ENSMUSG00000000001:E004 ########
ENSMUSG00000000001:E005 ########
... ...
ENSMUSG00000099330:E001 ########
ENSMUSG00000099331:E001 ########
ENSMUSG00000099332:E001 ########
ENSMUSG00000099333:E001 ########
ENSMUSG00000099334:E001 ########
Hi aditi,
To me it seems that you are not getting NAs for all the exons, look for example at the exons from the first gene (ENSMUSG00000000001). Still, it would be nice to trace where the warnings are being originated. What is the output of:
x <- is.na(mcols(dxd)$log2fold_untreated_treated) & !is.na(DEXSeqResults(dxd)$padj)
rownames(dxd)[which(x)]
Alejandro
You are right, Alejandro! I was a bit too hasty in saying that they are all NAs. It would be perfect to get rid of the warnings!
the output for the command (x <- is.na(mcols(dxd)$log2fold_mptp_control) & !is.na(DEXSeqResults(dxd)$padj)
(I'm printing only the last few entries here but all I see is a series of "FALSE"s)
>x
[99913] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99925] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99937] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99949] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99961] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99973] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99985] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[99997] FALSE FALSE FALSE
[ reached getOption("max.print") -- omitted 260777 entries ]
> rownames(dxd)[which(x)]
character(0)
Sorry! I guess I didn't run the command right! I'm a complete novice in this field. The output is a series of ensembl ids:
[99990] "ENSMUSG00000027397:E014"
[99991] "ENSMUSG00000027397:E015"
[99992] "ENSMUSG00000027397:E016"
[99993] "ENSMUSG00000027397:E017"
[99994] "ENSMUSG00000027397:E019"
[99995] "ENSMUSG00000027397:E020"
[99996] "ENSMUSG00000027397:E021"
[99997] "ENSMUSG00000027397:E022"
[99998] "ENSMUSG00000027397:E023"
[99999] "ENSMUSG00000027397:E024"
[ reached getOption("max.print") -- omitted 180515 entries ]
These are the last few lines. Have I done it right this time? Also examining my DEXSeqResults reports, I find that I have the fold change values for 280515 genes but NAs for some 80401 genes. Is it something expected out of DEXSeqResults?
Thanks!