Question: DEXSeq: controlling for cell composition
0
gravatar for skinnider
8 months ago by
skinnider0
skinnider0 wrote:

I am using DEXSeq to perform an analysis of alternative splicing in the spinal cord parenchyma 1 week after spinal cord injury. Following the “standard analysis workflow” given in the DEXSeq vignette returns about 1,000 alternatively spliced exons at 10% FDR. However, at 1 week after SCI it is possible that the cellular composition of the spinal cord has already undergone some changes: for example, death of neurons. I would like to identify which alternative splicing events are independent of changes in cell type composition, both because these are more likely to be of biological interest, and because section 4 of the vignette (“Additional technical or experimental variables”) suggests this may actually increase statistical power.

To estimate cell type composition, I followed the approach of Swarup et al., 2018, who took the top 100 cell type-specific marker genes for each of five CNS cell types based on RNA-seq data from purified cells described by Zhang et al., 2014, then included the first principal component of each set of marker genes in each sample as a covariate in their regression model. This gives me a data frame of sample annotations in the following format, where columns 3-7 represent the ‘eigengene’ of cell type-specific marker gene expression:

              sample injury astrocytes endothelial_cells microglia oligodendrocytes   neurons
1   1-200-SpinalCord    200  0.2559934         0.2626620 0.2624809        0.2608435 0.2522872
2   2-200-SpinalCord    200  0.2588842         0.2658519 0.2643757        0.2731271 0.2536084
3   3-200-SpinalCord    200  0.2580737         0.2655819 0.2648090        0.2642733 0.2523708
4   4-200-SpinalCord    200  0.2517741         0.2627561 0.2550055        0.2559144 0.2504905
5   5-200-SpinalCord    200  0.2515403         0.2540218 0.2520716        0.2497159 0.2553896
6   6-100-SpinalCord    100  0.2547717         0.2560722 0.2536112        0.2509286 0.2557450
7   7-100-SpinalCord    100  0.2641999         0.2636526 0.2692555        0.2708577 0.2628503
8   8-100-SpinalCord    100  0.2633527         0.2648022 0.2626432        0.2663232 0.2608956
9   9-100-SpinalCord    100  0.2460898         0.2424264 0.2417473        0.2336518 0.2558509
10 10-100-SpinalCord    100  0.2617045         0.2628493 0.2665769        0.2633762 0.2603489
11   11-0-SpinalCord      0  0.2664507         0.2605212 0.2614453        0.2675547 0.2635772
12   12-0-SpinalCord      0  0.2597377         0.2514553 0.2519198        0.2500942 0.2625129
13   13-0-SpinalCord      0  0.2633110         0.2545084 0.2588167        0.2586880 0.2645590
14   14-0-SpinalCord      0  0.2543777         0.2492514 0.2517410        0.2471603 0.2583947
15   15-0-SpinalCord      0  0.2618528         0.2552682 0.2550236        0.2575152 0.2634858

Here injury reflects the force delivered in kDyn and is modeled as a continuous covariate.

In order to identify AS events that are independent of cell type composition, I am trying to follow the code in section 4 of the vignette by including the ‘eigengenes’ of each cell type as a blocking factor, as follows:

formula_reduced = ~ sample + exon + injury:exon
formula_full = ~ sample + exon + injury:exon + 
  astrocytes:exon + endothelial_cells:exon + microglia:exon + 
  oligodendrocytes:exon + neurons:exon

# create dataset
dxd = DEXSeqDataSet(input, sampleData = targets, 
                    design = formula_full,
                    featureID = exon_ids, groupID = gene_ids)

# estimate size factors
dxd = estimateSizeFactors(dxd)
# estimate dispersion
dxd = estimateDispersions(dxd)

# test for AS
dxd = testForDEU(dxd,
                 reducedModel = formula_reduced, 
                 fullModel = formula_full)

However, this code is giving me the following error:

Error in solve.default(xtwx + ridge) : 
  system is computationally singular: reciprocal condition number = 3.9289e-19
Calls: estimateDispersions ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>

I tried removing a single cell type (endothelial cells) from the full formula, but still got the same error, and I am a bit reluctant to drop any more cell types from the analysis.

Could this be solved by using a lower tolerance? And, if so, is there a way to set this from estimateDispersions?

Thanks for your help.

sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] aggregation_1.0.1           magrittr_1.5                forcats_0.3.0               stringr_1.3.1              
 [5] dplyr_0.7.8                 purrr_0.2.5                 readr_1.1.1                 tidyr_0.8.2                
 [9] tibble_1.4.2                ggplot2_3.1.0               tidyverse_1.2.1             DEXSeq_1.26.0              
[13] RColorBrewer_1.1-2          AnnotationDbi_1.42.1        DESeq2_1.20.0               SummarizedExperiment_1.10.1
[17] DelayedArray_0.8.0          BiocParallel_1.14.2         matrixStats_0.54.0          Biobase_2.40.0             
[21] GenomicRanges_1.34.0        GenomeInfoDb_1.16.0         IRanges_2.16.0              S4Vectors_0.20.0           
[25] BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] nlme_3.1-137           bitops_1.0-6           lubridate_1.7.4        bit64_0.9-7            progress_1.2.0        
 [6] httr_1.3.1             tools_3.5.1            backports_1.1.3        R6_2.3.0               rpart_4.1-13          
[11] Hmisc_4.1-1            DBI_1.0.0              lazyeval_0.2.1         colorspace_1.3-2       nnet_7.3-12           
[16] withr_2.1.2            tidyselect_0.2.5       gridExtra_2.3          prettyunits_1.0.2      bit_1.1-14            
[21] compiler_3.5.1         cli_1.0.1              rvest_0.3.2            htmlTable_1.13         xml2_1.2.0            
[26] scales_1.0.0           checkmate_1.8.5        genefilter_1.62.0      digest_0.6.18          Rsamtools_1.32.3      
[31] foreign_0.8-71         XVector_0.20.0         base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6       
[36] readxl_1.1.0           htmlwidgets_1.3        rlang_0.3.0.1          rstudioapi_0.8         RSQLite_2.1.1         
[41] bindr_0.1.1            jsonlite_1.6           hwriter_1.3.2          acepack_1.4.1          RCurl_1.95-4.11       
[46] GenomeInfoDbData_1.1.0 Formula_1.2-3          Matrix_1.2-14          Rcpp_1.0.0             munsell_0.5.0         
[51] stringi_1.2.4          yaml_2.2.0             zlibbioc_1.26.0        plyr_1.8.4             grid_3.5.1            
[56] blob_1.1.1             crayon_1.3.4           lattice_0.20-35        Biostrings_2.48.0      haven_1.1.2           
[61] splines_3.5.1          annotate_1.58.0        hms_0.4.2              locfit_1.5-9.1         knitr_1.21            
[66] pillar_1.3.1           geneplotter_1.58.0     biomaRt_2.36.1         XML_3.98-1.16          glue_1.3.0            
[71] latticeExtra_0.6-28    modelr_0.1.2           data.table_1.11.8      cellranger_1.1.0       gtable_0.2.0          
[76] assertthat_0.2.0       xfun_0.4               xtable_1.8-3           broom_0.5.0            survival_2.42-6       
[81] memoise_1.1.0          bindrcpp_0.2.2         cluster_2.0.7-1        statmod_1.4.30
dexseq • 194 views
ADD COMMENTlink modified 7 months ago by Alejandro Reyes1.7k • written 8 months ago by skinnider0
Answer: DEXSeq: controlling for cell composition
0
gravatar for Alejandro Reyes
7 months ago by
Alejandro Reyes1.7k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.7k wrote:

This error results when the predictors of a regression are linearly dependent (or in your case, highly correlated). This will be a problem for any regression model. However, the cell-type compositions based on your estimates are almost identical across all samples. My suggestion would be to run the analysis using, individually, each of the cell-type composition as covariates and check whether the results change substantially.

ADD COMMENTlink written 7 months ago by Alejandro Reyes1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 220 users visited in the last hour