deseq2, edger different number of replicates and extraction kit
1
0
Entering edit mode
tonja.r ▴ 80
@tonjar-7565
Last seen 7.5 years ago
United Kingdom

1.Would it be a potential bias if I have 2 replicates for one condition and 4 replicates for the other? Or would it better to use then only 2 replicates per condition?

Dataset 1:
cond A,  2 replicates
cond B, 4 replicates
 


2.Also, in my another dataset I have 2 replicates extracted with Long PolyA+ RNA (single end, 50 bp) and 2 replicates (single end, 36 bp) extracted with PolyA+RNA and 2 replicates with Long PolyA+RNA (paired reads, 76) for condition A, the same for condition B. Would it be possible in design matrix to combine the replicates in order to have 6 rep for condition A and 6 for condition B? Or the replicates from different extraction methods are better to be analyzed separately?



Dataset 2:
cond A, Long PolyA+ RNA, single end, 50 bp, 2 rep
cond A, PolyA+RNA,           single end, 36 bp, 2 rep
cond A, Long PolyA+RNA, paired end, 76bp,  2 rep

cond B, Long PolyA+ RNA, single end, 50 bp, 2 rep
cond B, PolyA+RNA,           single end, 36 bp, 2 rep
cond B, Long PolyA+RNA​, paired end, 76bp,  2 rep

 

deseq2 edger • 1.7k views
ADD COMMENT
0
Entering edit mode

Can you make a little table which explains the preparation and condition status for the samples? I don't know that I follow exactly:

cond A, long polyA, paired
cond A, long polyA, single
​...

ADD REPLY
0
Entering edit mode

edited my question

ADD REPLY
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 7 hours ago
The city by the bay

For your first question, no. All popular methods for DE analyses can handle unbalanced designs.

For your second question - is it possible? Yes. Is it advisable? That's harder to answer. If the only aspect that differed between samples was the read length, then combining the samples might be okay, provided all the reads were long enough to be mapped accurately and counted into genes. However, differences in read length are often symptomatic of differences elsewhere, e.g., in sequencing batch, in library preparation, or in the processing center. You'd probably get a lot of protocol-specific biases that would inflate the variability if you treated them as direct replicates. If you're going to analyze them together, I would at least block on the extraction/preparation/sequencing protocol to mitigate these effects.

ADD COMMENT
0
Entering edit mode

What would be better: to analyze together the replicates that are at least from the same extraction kit and block for the sequencing protocol or to analyze everything separately and look at the intersections of the genes with FDR<0.05?

ADD REPLY
0
Entering edit mode

Hmm... the second one might be safer. Batch effects may interfere with quantitative comparisons of log-fold changes between blocks, such that it would be difficult to exploit any replication. Hopefully, the qualitative conclusions (i.e., which genes are DE) should still be the same for each block; you can then pull them out in the intersection between blocks.

ADD REPLY

Login before adding your answer.

Traffic: 1087 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6