Issues geting meaninfull results for single cell DGE using MAST
0
0
Entering edit mode
zroger499 • 0
@zroger499-23414
Last seen 11 months ago
Portugal

Hello bioconductor community,

I'm currently analyzing a published dataset of single-cell, which contains around 30 000 cell classified in 38 cell types, across 76 individuals. My objective is to get, for each celltype, differential expressed genes between "stim" and "control" individuals. When I run my analysis, I get only a small number of differential expressed genes.

Currently I'm trying to understand If I missed anything on my code, or I should change some parameter related to my analysis. I though about relaxing the threshold on the lowly expressed genes and also increase the treshold of the min number of cell per individual.

My preprocessing workflow consisted of:

  • Filtering genes which were not expressed in at least 3 cells, and cells which did not have between 200 features and 5000 expressed features. Furthermore, cells with at least 10% of counts from MT genes were discarded
  • I normalized and logged the count data (and scaled it by a factor of 100000).
  • I further filtered the data to keep only lincRNA and protein coding genes, and then took a subset of top 20 000 genes based on variance.

For mast analysis this is the code I'm using (for each celltype)

#celltype_metdata -> Metadata for each celltype
#celltype_exp -> expression table for each celltype 


celltype_metdata$wellKey <- celltype_metdata$CellBarcode_Identity
fData <- data.frame(primerid = row.names(lung_lognrom)) 

#Filter individuals with just one cell from that type
unique_cell_individuals <- names(which(table(celltype_metdata$Subject_Identity) > 1))
celltype_metdata <- celltype_metdata[(celltype_metdata$Subject_Identity %in% unique_cell_individuals), ]
celltype_metdata$Subject_Identity <- droplevels(celltype_metdata$Subject_Identity)

#Scale the age
celltype_metdata$AgeScaled = scale(as.numeric(celltype_metdata$Age),center=0)[,1]


#Load data on a sca object
sca <- FromMatrix(as.matrix(celltype_exp), celltype_metdata, fData)
sca <- filterLowExpressedGenes(sca, threshold=0.1) #Filter low expressed genes
colData(sca)$Subject_Identity <- droplevels(factor(colData(sca)$Subject_Identity))
colData(sca)$AgeScaled = as.numeric(colData(sca)$AgeScaled)
colData(sca)$Disease =  droplevels(colData(sca)$Disease)

#Compute CDR (cellular detection rate, or the % of genes expressed in each cell)
colData(sca)$cngeneson <- as.numeric(scale(colSums(assay(sca)>0)))

#Fit model 
zlmdata <- suppressMessages(zlm(formula(~ cngeneson+Sex+AgeScaled+Disease+condition+Manuscript_Identity+(1|Subject_Identity)), sca,  method='glmer', ebayes = F, fitArgsD=list(nAGQ=0)))
mast <- suppressMessages(summary(zlmdata, doLRT="conditionStim", fitArgsD=list(nAGQ=0)))
summaryDt <- mast$datatabl
fcHurdle <- merge(summaryDt[contrast=="conditionStim" & component=='H',.(primerid, `Pr(>Chisq)`)],
summaryDt[contrast==""conditionStim" & component=='logFC', .(primerid, coef, ci.hi,ci.lo)], by='primerid') 
fcHurdle[,fdr:=p.adjust(`Pr(>Chisq)`, 'fdr')]


sessionInfo( )

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] dplyr_1.0.7                 data.table_1.14.2
 [3] purrr_0.3.4                 MAST_1.20.0
 [5] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
 [7] Biobase_2.54.0              GenomicRanges_1.46.1
 [9] GenomeInfoDb_1.30.0         IRanges_2.28.0
[11] S4Vectors_0.32.3            BiocGenerics_0.40.0
[13] MatrixGenerics_1.7.0        matrixStats_0.61.0

loaded via a namespace (and not attached):
 [1] progress_1.2.2         tidyselect_1.1.1       reshape2_1.4.4
 [4] splines_4.1.2          lattice_0.20-45        colorspace_2.0-2
 [7] vctrs_0.3.8            generics_0.1.1         utf8_1.2.2
[10] rlang_0.4.12           pillar_1.6.4           nloptr_1.2.2.3
[13] glue_1.6.0             DBI_1.1.2              GenomeInfoDbData_1.2.7
[16] lifecycle_1.0.1        plyr_1.8.6             stringr_1.4.0
[19] zlibbioc_1.40.0        munsell_0.5.0          gtable_0.3.0
[22] parallel_4.1.2         fansi_1.0.2            Rcpp_1.0.8
[25] scales_1.1.1           DelayedArray_0.20.0    XVector_0.34.0
[28] abind_1.4-5            lme4_1.1-27.1          hms_1.1.1
[31] ggplot2_3.3.5          stringi_1.7.6          grid_4.1.2
[34] cli_3.1.0              tools_4.1.2            bitops_1.0-7
[37] magrittr_2.0.1         RCurl_1.98-1.5         tibble_3.1.6
[40] crayon_1.4.2           pkgconfig_2.0.3        ellipsis_0.3.2
[43] MASS_7.3-54            Matrix_1.4-0           prettyunits_1.1.1
[46] assertthat_0.2.1       minqa_1.2.4            R6_2.5.1
[49] boot_1.3-28            nlme_3.1-152           compiler_4.1.2
MAST SingleCell • 986 views
ADD COMMENT

Login before adding your answer.

Traffic: 499 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6