Preferential sequencing of longer Genes in Illumina?
I was reading this paper from 2010 :

It is mentioned that

One inherent bias of the illumina platform is the preferential sequencing of longer genes. Hence, longer genes are more likely declared as DE.

Is it true for the current Illumina platforms as well? And as a result, we observe low counts for some of the genes. For example, I am looking at the gene PYCR1, and after performing DEseq2 I have got good Log2foldchange (3.63) and Padj-value (1.95E-06), however, the basemean is about 23. Now, it is a known fact that in cancer this gene is upregulated, but the counts are not convincing. I am really confused about what to do here!

Yes longer genes have higher counts typically. This is taken into account for gene set testing in goseq which can be ran downstream of DESeq2 for gene set results.

For per gene results, you don’t modify the standard pipeline. DESeq2 has been extensively benchmarked.


