DEXSeq: controlling for cell composition
1
0
Entering edit mode
skinnider • 0
@skinnider-18958
Last seen 5.4 years ago

I am using DEXSeq to perform an analysis of alternative splicing in the spinal cord parenchyma 1 week after spinal cord injury. Following the “standard analysis workflow” given in the DEXSeq vignette returns about 1,000 alternatively spliced exons at 10% FDR. However, at 1 week after SCI it is possible that the cellular composition of the spinal cord has already undergone some changes: for example, death of neurons. I would like to identify which alternative splicing events are independent of changes in cell type composition, both because these are more likely to be of biological interest, and because section 4 of the vignette (“Additional technical or experimental variables”) suggests this may actually increase statistical power.

To estimate cell type composition, I followed the approach of Swarup et al., 2018, who took the top 100 cell type-specific marker genes for each of five CNS cell types based on RNA-seq data from purified cells described by Zhang et al., 2014, then included the first principal component of each set of marker genes in each sample as a covariate in their regression model. This gives me a data frame of sample annotations in the following format, where columns 3-7 represent the ‘eigengene’ of cell type-specific marker gene expression:

              sample injury astrocytes endothelial_cells microglia oligodendrocytes   neurons
1   1-200-SpinalCord    200  0.2559934         0.2626620 0.2624809        0.2608435 0.2522872
2   2-200-SpinalCord    200  0.2588842         0.2658519 0.2643757        0.2731271 0.2536084
3   3-200-SpinalCord    200  0.2580737         0.2655819 0.2648090        0.2642733 0.2523708
4   4-200-SpinalCord    200  0.2517741         0.2627561 0.2550055        0.2559144 0.2504905
5   5-200-SpinalCord    200  0.2515403         0.2540218 0.2520716        0.2497159 0.2553896
6   6-100-SpinalCord    100  0.2547717         0.2560722 0.2536112        0.2509286 0.2557450
7   7-100-SpinalCord    100  0.2641999         0.2636526 0.2692555        0.2708577 0.2628503
8   8-100-SpinalCord    100  0.2633527         0.2648022 0.2626432        0.2663232 0.2608956
9   9-100-SpinalCord    100  0.2460898         0.2424264 0.2417473        0.2336518 0.2558509
10 10-100-SpinalCord    100  0.2617045         0.2628493 0.2665769        0.2633762 0.2603489
11   11-0-SpinalCord      0  0.2664507         0.2605212 0.2614453        0.2675547 0.2635772
12   12-0-SpinalCord      0  0.2597377         0.2514553 0.2519198        0.2500942 0.2625129
13   13-0-SpinalCord      0  0.2633110         0.2545084 0.2588167        0.2586880 0.2645590
14   14-0-SpinalCord      0  0.2543777         0.2492514 0.2517410        0.2471603 0.2583947
15   15-0-SpinalCord      0  0.2618528         0.2552682 0.2550236        0.2575152 0.2634858

Here injury reflects the force delivered in kDyn and is modeled as a continuous covariate.

In order to identify AS events that are independent of cell type composition, I am trying to follow the code in section 4 of the vignette by including the ‘eigengenes’ of each cell type as a blocking factor, as follows:

formula_reduced = ~ sample + exon + injury:exon
formula_full = ~ sample + exon + injury:exon + 
  astrocytes:exon + endothelial_cells:exon + microglia:exon + 
  oligodendrocytes:exon + neurons:exon

# create dataset
dxd = DEXSeqDataSet(input, sampleData = targets, 
                    design = formula_full,
                    featureID = exon_ids, groupID = gene_ids)

# estimate size factors
dxd = estimateSizeFactors(dxd)
# estimate dispersion
dxd = estimateDispersions(dxd)

# test for AS
dxd = testForDEU(dxd,
                 reducedModel = formula_reduced, 
                 fullModel = formula_full)

However, this code is giving me the following error:

Error in solve.default(xtwx + ridge) : 
  system is computationally singular: reciprocal condition number = 3.9289e-19
Calls: estimateDispersions ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>

I tried removing a single cell type (endothelial cells) from the full formula, but still got the same error, and I am a bit reluctant to drop any more cell types from the analysis.

Could this be solved by using a lower tolerance? And, if so, is there a way to set this from estimateDispersions?

Thanks for your help.

sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] aggregation_1.0.1           magrittr_1.5                forcats_0.3.0               stringr_1.3.1              
 [5] dplyr_0.7.8                 purrr_0.2.5                 readr_1.1.1                 tidyr_0.8.2                
 [9] tibble_1.4.2                ggplot2_3.1.0               tidyverse_1.2.1             DEXSeq_1.26.0              
[13] RColorBrewer_1.1-2          AnnotationDbi_1.42.1        DESeq2_1.20.0               SummarizedExperiment_1.10.1
[17] DelayedArray_0.8.0          BiocParallel_1.14.2         matrixStats_0.54.0          Biobase_2.40.0             
[21] GenomicRanges_1.34.0        GenomeInfoDb_1.16.0         IRanges_2.16.0              S4Vectors_0.20.0           
[25] BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] nlme_3.1-137           bitops_1.0-6           lubridate_1.7.4        bit64_0.9-7            progress_1.2.0        
 [6] httr_1.3.1             tools_3.5.1            backports_1.1.3        R6_2.3.0               rpart_4.1-13          
[11] Hmisc_4.1-1            DBI_1.0.0              lazyeval_0.2.1         colorspace_1.3-2       nnet_7.3-12           
[16] withr_2.1.2            tidyselect_0.2.5       gridExtra_2.3          prettyunits_1.0.2      bit_1.1-14            
[21] compiler_3.5.1         cli_1.0.1              rvest_0.3.2            htmlTable_1.13         xml2_1.2.0            
[26] scales_1.0.0           checkmate_1.8.5        genefilter_1.62.0      digest_0.6.18          Rsamtools_1.32.3      
[31] foreign_0.8-71         XVector_0.20.0         base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6       
[36] readxl_1.1.0           htmlwidgets_1.3        rlang_0.3.0.1          rstudioapi_0.8         RSQLite_2.1.1         
[41] bindr_0.1.1            jsonlite_1.6           hwriter_1.3.2          acepack_1.4.1          RCurl_1.95-4.11       
[46] GenomeInfoDbData_1.1.0 Formula_1.2-3          Matrix_1.2-14          Rcpp_1.0.0             munsell_0.5.0         
[51] stringi_1.2.4          yaml_2.2.0             zlibbioc_1.26.0        plyr_1.8.4             grid_3.5.1            
[56] blob_1.1.1             crayon_1.3.4           lattice_0.20-35        Biostrings_2.48.0      haven_1.1.2           
[61] splines_3.5.1          annotate_1.58.0        hms_0.4.2              locfit_1.5-9.1         knitr_1.21            
[66] pillar_1.3.1           geneplotter_1.58.0     biomaRt_2.36.1         XML_3.98-1.16          glue_1.3.0            
[71] latticeExtra_0.6-28    modelr_0.1.2           data.table_1.11.8      cellranger_1.1.0       gtable_0.2.0          
[76] assertthat_0.2.0       xfun_0.4               xtable_1.8-3           broom_0.5.0            survival_2.42-6       
[81] memoise_1.1.0          bindrcpp_0.2.2         cluster_2.0.7-1        statmod_1.4.30
dexseq • 744 views
ADD COMMENT
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 6 days ago
Novartis Institutes for BioMedical Rese…

This error results when the predictors of a regression are linearly dependent (or in your case, highly correlated). This will be a problem for any regression model. However, the cell-type compositions based on your estimates are almost identical across all samples. My suggestion would be to run the analysis using, individually, each of the cell-type composition as covariates and check whether the results change substantially.

ADD COMMENT

Login before adding your answer.

Traffic: 735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6