Question

deseq2, edger different number of replicates and extraction kit

0

Entering edit mode

tonja.r ▴ 80

@tonjar-7565

Last seen 7.5 years ago

United Kingdom

1.Would it be a potential bias if I have 2 replicates for one condition and 4 replicates for the other? Or would it better to use then only 2 replicates per condition?

Dataset 1:
cond A, 2 replicates
cond B, 4 replicates

2.Also, in my another dataset I have 2 replicates extracted with Long PolyA+ RNA (single end, 50 bp) and 2 replicates (single end, 36 bp) extracted with PolyA+RNA and 2 replicates with Long PolyA+RNA (paired reads, 76) for condition A, the same for condition B. Would it be possible in design matrix to combine the replicates in order to have 6 rep for condition A and 6 for condition B? Or the replicates from different extraction methods are better to be analyzed separately?

Dataset 2:
cond A, Long PolyA+ RNA, single end, 50 bp, 2 rep
cond A, PolyA+RNA, single end, 36 bp, 2 rep
cond A, Long PolyA+RNA, paired end, 76bp, 2 rep

cond B, Long PolyA+ RNA, single end, 50 bp, 2 rep
cond B, PolyA+RNA, single end, 36 bp, 2 rep
cond B, Long PolyA+RNA, paired end, 76bp, 2 rep

deseq2 edger • 1.7k views

ADD COMMENT • link 8.3 years ago tonja.r ▴ 80

0

Entering edit mode

Can you make a little table which explains the preparation and condition status for the samples? I don't know that I follow exactly:

cond A, long polyA, paired
cond A, long polyA, single
...

ADD REPLY • link 8.3 years ago Michael Love 41k

0

Entering edit mode

edited my question

ADD REPLY • link 8.3 years ago tonja.r ▴ 80

score 3 · Answer 1 · 2016-01-19

3

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 7 hours ago

The city by the bay

For your first question, no. All popular methods for DE analyses can handle unbalanced designs.

For your second question - is it possible? Yes. Is it advisable? That's harder to answer. If the only aspect that differed between samples was the read length, then combining the samples might be okay, provided all the reads were long enough to be mapped accurately and counted into genes. However, differences in read length are often symptomatic of differences elsewhere, e.g., in sequencing batch, in library preparation, or in the processing center. You'd probably get a lot of protocol-specific biases that would inflate the variability if you treated them as direct replicates. If you're going to analyze them together, I would at least block on the extraction/preparation/sequencing protocol to mitigate these effects.

ADD COMMENT • link 8.3 years ago Aaron Lun ★ 28k

0

Entering edit mode

What would be better: to analyze together the replicates that are at least from the same extraction kit and block for the sequencing protocol or to analyze everything separately and look at the intersections of the genes with FDR<0.05?

ADD REPLY • link 8.3 years ago tonja.r ▴ 80

0

Entering edit mode

Hmm... the second one might be safer. Batch effects may interfere with quantitative comparisons of log-fold changes between blocks, such that it would be difficult to exploit any replication. Hopefully, the qualitative conclusions (i.e., which genes are DE) should still be the same for each block; you can then pull them out in the intersection between blocks.

ADD REPLY • link 8.3 years ago Aaron Lun ★ 28k