Issues geting meaninfull results for single cell DGE using MAST
Entering edit mode
zroger499 • 0
Last seen 1 day ago

Hello bioconductor community,

I'm currently analyzing a published dataset of single-cell, which contains around 30 000 cell classified in 38 cell types, across 76 individuals. My objective is to get, for each celltype, differential expressed genes between "stim" and "control" individuals. When I run my analysis, I get only a small number of differential expressed genes.

Currently I'm trying to understand If I missed anything on my code, or I should change some parameter related to my analysis. I though about relaxing the threshold on the lowly expressed genes and also increase the treshold of the min number of cell per individual.

My preprocessing workflow consisted of:

  • Filtering genes which were not expressed in at least 3 cells, and cells which did not have between 200 features and 5000 expressed features. Furthermore, cells with at least 10% of counts from MT genes were discarded
  • I normalized and logged the count data (and scaled it by a factor of 100000).
  • I further filtered the data to keep only lincRNA and protein coding genes, and then took a subset of top 20 000 genes based on variance.

For mast analysis this is the code I'm using (for each celltype)

#celltype_metdata -> Metadata for each celltype
#celltype_exp -> expression table for each celltype 

celltype_metdata$wellKey <- celltype_metdata$CellBarcode_Identity
fData <- data.frame(primerid = row.names(lung_lognrom)) 

#Filter individuals with just one cell from that type
unique_cell_individuals <- names(which(table(celltype_metdata$Subject_Identity) > 1))
celltype_metdata <- celltype_metdata[(celltype_metdata$Subject_Identity %in% unique_cell_individuals), ]
celltype_metdata$Subject_Identity <- droplevels(celltype_metdata$Subject_Identity)

#Scale the age
celltype_metdata$AgeScaled = scale(as.numeric(celltype_metdata$Age),center=0)[,1]

#Load data on a sca object
sca <- FromMatrix(as.matrix(celltype_exp), celltype_metdata, fData)
sca <- filterLowExpressedGenes(sca, threshold=0.1) #Filter low expressed genes
colData(sca)$Subject_Identity <- droplevels(factor(colData(sca)$Subject_Identity))
colData(sca)$AgeScaled = as.numeric(colData(sca)$AgeScaled)
colData(sca)$Disease =  droplevels(colData(sca)$Disease)

#Compute CDR (cellular detection rate, or the % of genes expressed in each cell)
colData(sca)$cngeneson <- as.numeric(scale(colSums(assay(sca)>0)))

#Fit model 
zlmdata <- suppressMessages(zlm(formula(~ cngeneson+Sex+AgeScaled+Disease+condition+Manuscript_Identity+(1|Subject_Identity)), sca,  method='glmer', ebayes = F, fitArgsD=list(nAGQ=0)))
mast <- suppressMessages(summary(zlmdata, doLRT="conditionStim", fitArgsD=list(nAGQ=0)))
summaryDt <- mast$datatabl
fcHurdle <- merge(summaryDt[contrast=="conditionStim" & component=='H',.(primerid, `Pr(>Chisq)`)],
summaryDt[contrast==""conditionStim" & component=='logFC', .(primerid, coef, ci.hi,ci.lo)], by='primerid') 
fcHurdle[,fdr:=p.adjust(`Pr(>Chisq)`, 'fdr')]

sessionInfo( )

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/

 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] dplyr_1.0.7                 data.table_1.14.2
 [3] purrr_0.3.4                 MAST_1.20.0
 [5] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
 [7] Biobase_2.54.0              GenomicRanges_1.46.1
 [9] GenomeInfoDb_1.30.0         IRanges_2.28.0
[11] S4Vectors_0.32.3            BiocGenerics_0.40.0
[13] MatrixGenerics_1.7.0        matrixStats_0.61.0

loaded via a namespace (and not attached):
 [1] progress_1.2.2         tidyselect_1.1.1       reshape2_1.4.4
 [4] splines_4.1.2          lattice_0.20-45        colorspace_2.0-2
 [7] vctrs_0.3.8            generics_0.1.1         utf8_1.2.2
[10] rlang_0.4.12           pillar_1.6.4           nloptr_1.2.2.3
[13] glue_1.6.0             DBI_1.1.2              GenomeInfoDbData_1.2.7
[16] lifecycle_1.0.1        plyr_1.8.6             stringr_1.4.0
[19] zlibbioc_1.40.0        munsell_0.5.0          gtable_0.3.0
[22] parallel_4.1.2         fansi_1.0.2            Rcpp_1.0.8
[25] scales_1.1.1           DelayedArray_0.20.0    XVector_0.34.0
[28] abind_1.4-5            lme4_1.1-27.1          hms_1.1.1
[31] ggplot2_3.3.5          stringi_1.7.6          grid_4.1.2
[34] cli_3.1.0              tools_4.1.2            bitops_1.0-7
[37] magrittr_2.0.1         RCurl_1.98-1.5         tibble_3.1.6
[40] crayon_1.4.2           pkgconfig_2.0.3        ellipsis_0.3.2
[43] MASS_7.3-54            Matrix_1.4-0           prettyunits_1.1.1
[46] assertthat_0.2.1       minqa_1.2.4            R6_2.5.1
[49] boot_1.3-28            nlme_3.1-152           compiler_4.1.2
MAST SingleCell • 120 views

Login before adding your answer.

Traffic: 547 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6