I immediately admit that I am an expert in R, or scripting in general for that matter, so I hope this is an easy question:
I do not know how to include a decent example dataset for this so I try to explain the problem
I am analyzing single-cell RNA-seq data, using the Monocle (Bioconductor) package. This package allows you to order cells in pseudotime based on their transcriptome, thereby providing insight into transcriptome dynamics during development in pseudotime. I am trying to use the command
plot_pseudotime_heatmap to generate a heatmap for a fitted linear model on the gene-expression pattern of a number of disease-markers in pseudotime.
Though for some genesets I get the following error when plotting the heatmap:
Error in cutree(tree, cutree_n) : elements of 'k' must be between 1 and 3.
I cannot figure out what k is (I thought the number of cluster to be formed, but why does this need to be between 1 and 3...), or which setting to change to overcome this error:
Default settings for plot_pseudotime_heatmap():
plot_pseudotime_heatmap(cds_subset, cluster_rows = TRUE,
hclust_method = "ward.D2", num_clusters = 6, hmcols = NULL,
add_annotation_row = NULL, add_annotation_col = NULL,
show_rownames = FALSE, use_gene_short_name = TRUE,
norm_method = c("vstExprs", "log"), scale_max = 3, scale_min = -3,
trend_formula = "~sm.ns(Pseudotime, df=3)", return_heatmap = FALSE,
cores = 1)
As far as I understand, in this package, the command plot_pseudotime_heatmap, first orders the cells according to their pseudotime in 'development', next fits a linear model to the expression pattern per gene (with lm.fit()), and finally transforming this data into a heatmap (using a variant of pheatmap() ), where the pseudotime is split over 100 pseudotime-bins.
I have the difficulty of working with a very rare cellpopulation, causing some for the disease-marker genes to be expressed in only 1 or a few cells and with only a few reads per gene (cut-off for expressed-genes is >=1read in >=1 cell). I understand that trying to fit a linear model on these genes will be impossible, and that this gene will get excluded from the heatmap and I will get the error:
<simpleError in lm.fit(X.vlm, y = z.vlm, ...): NA/NaN/Inf in 'y'>. But why can I sometimes continue with the remaining genes, but sometimes still get the above error on the remaining genes??
Many thanks for any help I can get!