Entering edit mode
I look at the statues and see that most genes are category 4
> table(normalisedData[["convergence"]])
0 1 2 3 4
3049 516 4615 1 8098
Category 4 is "baseline selection didn't coverge". What does this mean about my analysis? Could you add a paragraph to the vignette explaining these values and what they imply about an analysis?
Thanks for bringing up this question. In DegNorm, convergence tag was defined for baseline selection in this way: 0 = degnorm was not done on this gene because smaller counts or too short length.1 = degnorm was done with baseline selection. 2 = degnorm done without baseline selection because gene length (after filtering out low count regions)< 200 bp. 3= baseline was found, but DI score is too large. 4 = baseline selection didn't coverage.
The baseline selection is to find a region on the total transcript such that they have similar shape. This is a step further to refine the degradation normalization on top of the single-round of matrix factorization-over estimation. The idea is that if there is a region across samples share similar informative pattern, then it could provide an anchoring point for relative abundance of different samples. One criterion to conclude they have similar shape is that if the degradation index on that regions is less than 0.2 for every sample after performing matrix factorization. The threshold of 0.2 is subjectively set. A perfectly same coverage curve shape will give you 0 degradation index score. When you have many many samples (in your case 69), such a region may not exist because of the larger variance between samples. In this case convergence=4 is set. DegNorm algorithm still performs and output results without baseline selection. In this case, we should still regard the results as valid even though it may not as perfect as we wish with baseline selection.
Details about the baseline selection can be found : https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1682-7