Question

Uneven group sizes

1

Entering edit mode

le2336 ▴ 20

@le2336-10789

Last seen 4.3 years ago

Hello,

To simplify the description of my problem, I will use the same variables as in Example 3 of ?results in DESeq2. I have the following samples from 2 genotypes (I and II) treated under 2 conditions (A and B)

sample	genotype	condition
1	I	A
2	I	A
3	I	A
4	I	A
5	I	A
6	I	A
7	I	A
8	I	A
9	I	B
10	I	B
11	I	B
12	I	B
13	II	A
14	II	A
15	II	A
16	II	A
17	II	B
18	II	B
19	II	B

If we define "upregulated genes" as those that have higher expression in condition B compared to condition A, I would like to test the null hypothesis that genotype II does not have more upregulated genes than genotype I.

I wrote my design formula as:

design(dds) <- ~ genotype + condition + genotype:condition

I obtained the interaction term for the condition effect in genotype II vs genotype I:

results(dds, name="genotypeIII.conditionB", altHypothesis="greater")

22% of the genes were more significantly upregulated in genotype II than in genotype I. I am wondering whether this particular analysis is sensitive to having uneven replicate sizes between the two genotypes -- while the biological replicates are all well correlated, I have more samples overall from genotype I than genotype II, particularly the condition A samples (8 genotype I vs 4 genotype II).

I reran this several times by subsampling replicates from genotype I so that the group sizes for genotypes I and II were equal (4 genotype I condition A, 3 genotype I condition B, 4 genotype II condition A, 3 genotype II condition B). When I do this with different subsets of genotype I samples, 72-80% of the genes are more significantly upregulated in genotype II than in genotype I. These additional genes do seem to be bona fide upregulated genes by other independent measures (e.g. levels of their protein products). With which set of upregulated genes should I proceed?

Thank you in advance for your help.

deseq2 replicates • 1.8k views

ADD COMMENT • link updated 7.9 years ago by Michael Love 43k • written 7.9 years ago by le2336 ▴ 20

score 2 · Accepted Answer · 2017-01-27

"22% of the genes were more significantly upregulated in genotype II than in genotype I. I am wondering whether this particular analysis is sensitive to having uneven replicate sizes between the two genotypes"

No, the uneven group size shouldn't be a problem.

My first suggestion for users who are interested in diving into more detail is to look at plotCounts of the significant genes: a couple of the most significant, and then further down to the ones that are marginally significant. If the marginally significant ones do not show a very big effect size, or the effect size is not of biological significance for you, you can use the lfcThreshold to require that the increase in upregulation in genotype II compared to I is more than some specified threshold (see our DESeq2 paper for more description of the motivation of an LFC threshold greater than 0).

I would not recommend removing samples from the larger group, this will just hurt the analysis (less samples to accurately estimate the dispersion and effect sizes, and the uneven group size is not an issue here).