Question

DeSeq2 with extremely variable samples

0

Entering edit mode

Helene • 0

@helene-18644

Last seen 6.7 years ago

Hello all,

I am writing to learn how to set up DESeq2 when my samples have large variation in gene counts. For example below is one row from my gene count table.

Samples	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o	p	q	r	s	t	u	v	w	x	y	z	aa	bb	cc	dd	ee	ff	gg	hh	ii	jj	kk	ll
Gene240880	0	0	0	0	0	0	347	248	6	21	0	0	0	0	605	665	438	760	597	511	448	184	0	0	0	0	0	0	0	0	16	44	17	5	215	0	0	0

As you can see that some samples have hundreds of reads at gene240880, but some have zero. When I feed the whole table (with ~25k genes) to Deseq2, using pretty much default settings recommended in the DEseq2 tutorial, and looking at comparisons between condition 1 (triplets z, aa, bb) and condition 2 (triplets jj, kk, ll), for some reason I get a very significant p value --- even the numbers are all zeros, there were an log2FoldChange.

I figure it may have something to do with my gene's variable behavior, so for now I split the tables to leave only the 6 samples for condition 1 and 2, and the comparisons no longer show problem. (We have been using deseq2 for quite a while, and this is the first time we need to split tables).

Therefore I would love to learn what is the reason for this problem - is it due to the normalization deseq2 does? Also, because of this incidence, I am a little worried when and how exactly I should consider to split samples apart when using deseq2. Any advice is appreciated!

deseq2 • 938 views

ADD COMMENT • link updated 7.0 years ago by Michael Love 43k • written 7.0 years ago by Helene • 0

score 1 · Answer 1 · 2018-12-03

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

Yes, there is an instability when the data are far from a Negative Binomial (e.g. bimodal count distribution within a condition), that can lead to non-zero LFC and Wald statistic despite both groups having zero counts. If you use the latest version of DESeq2, it would not show up in the gene list, as we check for cases like this. Also other solutions are to use an LRT which will not give a significant p-value, to use lfcShrink with lfcThreshold, or your subset-to-two-groups approach.

ADD COMMENT • link 7.0 years ago Michael Love 43k

0

Entering edit mode

Hi Dr. Love - thanks so much for the prompt reply. I will update DESeq2 and try again. Among all the methods you mentioned, would you recommend the subset-to-two-groups approach? I personally like to hold the complete table together, but I guess in theory it should not matter.