Hi,
I'm using EBSeq-HMM to find DE genes. The problem is among 17611 genes, 16139 genes are identified as DE which is very unusual.
My cutoff to filter genes out before detecting DE genes was to discard all those genes without at least 5 reads in at least one sample.
Do you think my result is ok? I can conduct further analysis to obtain the most significant DE genes, but since I want to investigate the closeness of this model to another model, I have to run this model by its default settings.
What do you think accounts for this large number of DEGs found? Has anyone worked with this package before? I read the paper behind this package. It seems they themselves have declared that their model has found more DEGs in comparison with other models such as edgeR, DESeq2, Voom, etc.. But detecting 16139 genes as DE out of 17611 genes is way inordinate.
Please if you haven't worked with this package but you can sense what is probably going wrong, let me know.
Can my filtering Criteria to discard low expressed genes be a reason? Should I filter genes more strictly before I use this package?
Thanks a lot
Not an EBSeq-HMM user here, but nonetheless it would be handy to know how many timepoints (or number of ordinal levels) you have, and how many replicates. A large number of timepoints offers a large number of non-flat paths, and it wasn't immediately clear how this is dealt with.
Maybe do a DESeq2 LRT on the ordinal factor, and visually inspect some of the genes that are DESeq2-null but not not EBSeq-null.
No - I'd say that is unusual in my experience, but so is 10-12k. It makes me wonder if variance is being (artificially) reduced - is it a real experiment, or are you simulating the data