I am currently using DESeq2 package to analyze differential expression of 6 RNAseq samples (each condition has 3 biological replicates = 3 independant inoculations). One sample is problematic as out of 44,480 genes, 780 are highly expressed in that same sample but are at 0 for the 5 other samples. Using sample to sample distance, this sample clusters alone and the levels of expression of these genes are very high (almost 1,200 reads for one of these genes). But aside this rather small amount of genes, that sample is not consistently an outlier and behave "normally" like the other replicates. (I also read this post : https://support.bioconductor.org/p/95755/ )
I was wondering if DESeq2 was capable of dealing proprely with these outlier genes knowing the "small" amount of replicates available? So I checked the results of the P-adj and most of them had NA (which I understand means that are marked as outliers, right?). However, 12 of them still have a p-value < 0.05 but the p-adj corrected this and are > 0.05.
So apparently, it is well capable of dealing with these outlier gene (fiou!), but in that case do I loose power to identify real DEG by introducing noise with these genes ? In that case, Is it possible to/ is it correct to just remove these type of genes as a pre-filtering ?
Thank you in advance for your help,