Question

Differential expression in samples where major gene is downregulated

0

Entering edit mode

dmr210 ▴ 30

@dmr210-12497

Last seen 6.6 years ago

Hi,

I have samples where one gene accounts for more than 40% of the total number of reads in normal conditions. In one phenotype that I consider, that gene is up-regulated.

How will that impact the differential expression of the other genes?

How does DESeq2 do the normalisation to avoid considering these other genes artificially down-regulated because of that?

Let's look at 'fake' numbers of RNA molecules:

Phenotype 1

G1	G2	G3	G4	G5	G6	G7	G8	G9	G10
1000	50	60	12	150	180	140	10	190	45

Total number of molecules: ~ 2000

Phenotype 2

G1	G2	G3	G4	G5	G6	G7	G8	G9	G10
1500	50	60	12	150	180	140	10	190	45

Total number of molecules: ~2500

The sequencing depth might be the same between the two, so normalising by sequencing depth is not going to help correct for that. Also, DESeq2 assumes a log normal distribution for the gene expression levels, but I was wondering if such a high read count for one single gene might make that assumption wrong?

I am unsure if this is simply equivalent to half of the genes being up-regulated in the sample, with no genes down-regulated, which DESeq2 is clearly equipped to tackle, or if it is different?

Could you explain how DESeq2 accounts for cases such as this one?

Thanks very much,

Delphine

EDIT: I attach an MAplot, and changed up to down and down to up as my plot was the other ay around compared to what I had written (the gene I am talking about is up-regulated in this plot, because of the condition considered as baseline)

MAplot

deseq2 • 726 views

ADD COMMENT • link updated 7.0 years ago by Michael Love 41k • written 7.0 years ago by dmr210 ▴ 30

0

Entering edit mode

Can you post an image (you can use imgur.com for hosting) of the MA plot if you use DESeq2? You can get a quick sense of how the normalization works. Or you can even plug in some simulated counts like you have above to see how it works.

ADD REPLY • link 7.0 years ago Michael Love 41k

score 0 · Answer 1 · 2017-05-01

hi,

So the DESeq2 normalization (and similarly with edgeR's normalization method) is not thrown off by a minority of genes with differential expression, because it uses the median of ratios across all genes. Even though a single gene accounts for 40% of the reads, it has little leverage on the size factor calculation because it is just one gene out of thousands, and the median across genes is used.