How many 'expressed genes' do I have in my dataset?
1
0
Entering edit mode
@ade-giorgio12-7809
Last seen 9.5 years ago
European Union

Hi,

I'm trying to assess the significance of the overlap between groups of genes.

To do this I need the number of genes 'expressed' in my samples - I've read people use an FPKM value >1 as a rough cutoff for this, which gives me 10-12,000 genes depending on the sample.

Does anyone know an equivalent cutoff using the baseMean output from DESeq2? I'm trying to keep number of 'expressed genes' consistent with the analysis it derives from.

Thanks very much!

Alex

deseq2 cuffdiff • 1.8k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 17 hours ago
United States

hi Alex,

DESeq2 does have an fpkm() function, which works automatically if you've used summarizeOverlaps to construct the counts, or if you can add the gene length information (see ?fpkm). This divides the normalized counts for each gene by the union of the exonic basepairs which were used for counting. This is important because the normalized counts are proportional to gene expression as well as gene length (and other factors), so you want to divide out the gene length to get closer to something like expression.

You can also consider Bioconductor software like cqn or EDASeq, which additionally will correct for sample-specific gene length and GC-content curves. See the vignettes of those packages for details.

I don't have any recommendation on a generic cutoff though for expressed/not expressed.

ADD COMMENT
0
Entering edit mode

I would add that no such generic "expressed/not expressed" cutoff exists for all experiments, because compositional biases mean that the appropriate threshold would be different for every dataset. (I.e. the same reason that you need to estimate size factors in order to compare between samples).

ADD REPLY

Login before adding your answer.

Traffic: 801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6