Question

Technical and biological replates in BitSeq

0

Entering edit mode

Sam McInturf ▴ 300

@sam-mcinturf-5291

Last seen 8.6 years ago

United States

Bioconductors,

tl;dr: Does bitseq differeniate between biological and technical variation, if it does, how do I tell BitSeq which files are belong to which biological replicate

I am looking to use BitSeq to analyze my single end illumina data. I have a combinatorial design (3x2x2) where each condition has 3 biological replicates, and each biological replicate was split between 3 lanes to have 3 technical replicates. So I can use BitSeq::getExpression for every technical replicate, to estimate the expression of each transcript and produce a rpkm file.

I'll write each file name to describe the sample as c<condition>b<bioRep>t<techrep> (c1b1t1, c1b1t2, c1b1t3, c1b2t1, ..., c2b3t3).

If I had no technical replicates I would simple say

getDE(list("A" = c(c1b1,c1b2, c1b3), "B"=1(c2b1,c2b2,c2b3))

But if I include my technical replicates as

getDE(list("A" = c(c1b1t1, c1b1t2, c1b1t3, c1b2t1, ..., c1b3t1, ...), "B"=1(c2b1t1, c2b1t2, c2b1t3, c2b2t1, ...,c2b3t1, ...))

But this does not inform BitSeq of the relationship of the variance between each sample. (technical and biological variation). I have read the Bioinformatics paper (vol 28 no 13. 2012, pages 1721-1728) with some level of understanding, but I am by no means fluent/good with bayesian concepts. But I didn't see an explicit term for biological and technical variance (although I am used to dealing with tech reps by comparing a full model vs a reduced model, DESeq2 style). In section 3.4 DE analysis, second paragraph, the authors talk about combining the posterior probabilities, but I believe that is a direct reference to making Figure 5 b and d, not to how to feed in the data prior to DE calls.

Thanks for any wisdom!

Sam

BitSeq bitseq rnaseq • 1.3k views

ADD COMMENT • link updated 8.6 years ago by Ryan C. Thompson ★ 7.9k • written 8.6 years ago by Sam McInturf ▴ 300

0

Entering edit mode

Hi Sam,

Currently, BitSeq does not support technical replicates. I would proceed exactly as you mentioned at the second getDE command and then I would check DE consistency by comparing with the results arising when combining all technical replicates into a single sample (e.g: c1b1 = c1b1t1 + c1b1t2 + c1b1t3, c1b2 = c1b2t1 + c1b2t2 + c1b2t3 etc...) as Ryan suggested.

ADD REPLY • link 8.6 years ago panagiotis.papastamoulis ▴ 30

score 2 · Accepted Answer · 2015-09-24

General practice is to combine technical replicates into a single samples. This is justified because in the absence of alternative isoforms, technical variation is known to follow a Poisson distribution and the sum of two Poission distributions is a Poisson distribution, so no information is lost by combining. This is complicated a bit by splicing, but in any case, from my reading of the BitSeq paper, it uses MCMC to estimate the distribution (i.e. technical variation) of each transcript in each sample, so I think there's no need to keep the technical replicates separate. (I'm not a BitSeq user, though; this is just based on a quick reading of the methods section of the paper.)

In any case, if you are worried about excess technical variation, I would recommend quantifying the replicates separately and running a PCA plot or other exploratory data analysis techniques. If the technical replicates all cluster closely together, you are probably justified in combining them.