Question

marker genes in single cell experiment

0

Entering edit mode

lirongrossmann ▴ 40

@lirongrossmann-23954

Last seen 3.2 years ago

Hi,

I am trying to create a reference dataset from 40 different cell types (fpkm_matrix is a 26,000 x 40 log count matrix) and I am trting to find gene markers for each cell line.

I used the following code:

cell.matrix<- SingleCellExperiment(list(logcounts = as.matrix(fpkm_matrix)))
colLabels(cell.matrix) <- colnames(fpkm_matrix)
out <- pairwiseTTests(cell.matrix, cell.matrix$label , direction="up")

and got the following error

Error in .compute_mean_var(x, BPPARAM = BPPARAM, subset.row = subset.row,  : 
  no residual d.f. in any level of 'block' for variance estimation

Based on that, I am suspecting there may not be a lot of difference between the cells types but I know that there is.

Any input would be appreciated.

Thanks, Liron

single cell singlecellexperiment gene markers • 1.3k views

ADD COMMENT • link updated 3.7 years ago by Aaron Lun ★ 28k • written 3.7 years ago by lirongrossmann ▴ 40

score 0 · Answer 1 · 2020-08-27

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 26 days ago

United States

My guess is that your cell.matrix contains no 'label' component since there's no 'label', only the expression, when you created it. In other words, cell.matrix$label is NULL. Maybe you have another variable (list) that contains the labels (component 'label') - you would want to use that instead of cell.matrix$label.

ADD COMMENT • link 3.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thanks, Peter. I forgot to add the code line with the labels in my question (I did it in my original code). I added it to the question, but I still get the above error....

ADD REPLY • link 3.7 years ago lirongrossmann ▴ 40

score 0 · Answer 2 · 2020-08-27

I am going to guess that each column name is unique, in which case there are no replicates for any of the labels; in this case, computing a p-value for differential comparisons is not possible. This is reflected in the error message, where it's telling you that there are no residual degrees of freedom for the t-test.

Check if the labels are something like CD4_rep1, CD4_rep2, etc. in which case you can just sub() out the _repX to get consistent labels for the same cell type. However, if you actually only have one column per cell type, you're stuffed. There's no way to compute p-values here. Perhaps use SingleR::getClassicMarkers() instead to get the top markers with the largest log-fold changes.

For either function, one would typically use the log-transformed values.