Hi,
I did a pilot monocle analysis with 34 single cell samples.I am modeling a branched pathway with a progenitor cell (n=5), giving rise to 3 mature cell types (n=9-11). I understand that it is hugely under powered, but on the plus side, the libraries are sequenced very deep (30-70 million reads per sample). Data was normalized using DESeq2 ( there is a strong 3' bias in the data, and it does not make any sense to do FPKM normalization). Genes with total counts below 50 were filtered out and I used the code below for differential expression and to order cells.While the ordering results make sense (each sub type was in a different branch), the set of differentially expressed genes (2500 out of 24800) do not make sense.Most of the top 20 differentially expressed genes have counts near the minimum and although few of the other genes do make sense, overall it does not.Is this because of the branched pathway? I tried to do psudo temporal ordering of genes, picking up from here, but the differential gene tests with state as variable ( for which irrelevant cell types were excluded) also produce similar results. (I input CDS1 below for this analysis).
Thanks for your help!
Sunil
fpkm_matrix <- read.table('NormMatrix.txt', sep="\t", header = TRUE, row.names=1) # fpkm matrix
sample_sheet<- read.delim("ColInfo.txt", row.names=1) # sample sheet
gene_annotations<- read.delim("GeneInfo1.txt", row.names=1) # gene annotations
fd<- new("AnnotatedDataFrame", data = gene_annotations)
pd <- new('AnnotatedDataFrame', data =sample_sheet)
CDS <- newCellDataSet(as.matrix(fpkm_matrix), phenoData = pd, featureData = fd) # Create new CellDataSet object
# DE analysis
log <- capture.output({diff_test_res <- differentialGeneTest(CDS, fullModelFormulaStr = "expression~CellType")})
# Select genes that are significant at an FDR < 10%
# ORDERING CELLS
sig_genes <- subset(diff_test_res, qval < 0.1)
ordering_genes <- row.names (subset(diff_test_res, qval < 0.1))
CDS1 <- setOrderingFilter(CDS, ordering_genes)
CDS1 <- reduceDimension(CDS, use_irlba=FALSE)
CDS1<- orderCells(CDS, num_paths=3 reverse=TRUE)
plot_spanning_tree(CDS1)