I am using DEseq2 to quantify the gene expression of a very limited set of non coding genes.
It might be obvious, but can I still trust the significance of the differential expression that I obtain, or would having a limited set of genes cause some genes that are not strongly differentially expressed to appear as significantly differentially expressed?
How does selecting a part of the genes only impact the final assessment of the differential expression - so if I was running the same differential expression with these genes along with a full annotation set such as all the ensembl genes, would I still find the same genes as significantly differentially expressed? (without thinking of the obvious change due to the quantification of the raw reads itself which would undoubtedly be slightly different due to gene overlap)
Many thanks, Delphine
How many genes are in the set you are interested in? Did you do a full RNA-seq experiment, or did the assay only target the limited set? If you only targeted the limited set with the assay, do you have some control genes or spike ins for normalization?
Thanks for your answer. There are about 3600 genes in that set. We did do a full RNA-seq experiment.
My data is not stranded, which means that if I use a full annotation such as ensembl along with these 3600 genes of interest and quantify using Htseq-count a lot of my genes of interest are considered as ambiguous, hence separating them from the full data. (I wanted to avoid using a quantification method that would count the read twice when ambiguous)