**20**wrote:

I am analyzing bulk RNA data from neurons derived from iPSC cells. The experimental design matrix is as follows:

Lineage | Method | |
---|---|---|

A | 1 | |

A | 2 | |

A | 3 | |

B | 1 | |

B | 2 | |

B | 3 | |

C | 1 | |

C | 2 | |

C | 3 |

Here `lineage`

indicates what iPSC line the neuronal sample was derived from, and `method`

the method used to derive said neurons. All iPSC lines were derived from the same starting material, so they can be considered to be biological replicates of each other.

The main question I am trying to answer is what genes are significantly up/down regulated for a given `method`

compared to the other two methods. I am also interested in determining how 'bad' of a batch effect the source iPSC `lineage`

is (ie. how much of the variation is explained by `lineage`

vs by `method`

, we hope to find it is mostly explained by `method`

)

The way I have been handling this with `DESeq2`

is using the design `~ lineage + method`

. One of my colleagues claimed, however, that this was an inappropriate use of `DESeq2`

since I do not have the right replicate structure. He claimed that I needed biological replicates that were identical from a design perspective in order for `DESeq2`

to be an appropriate choice (eg multiple samples generated from `lineage`

`A`

using `method`

`1`

, ect). This would imply design matrix like this (here I am adding a sample column for clairty):

Lineage | Method | Sample | ||
---|---|---|---|---|

A | 1 | S1 | ||

A | 1 | S2 | ||

A | 2 | S3 | ||

A | 2 | S4 | ||

.... | .... | .... | ||

C | 3 | S18 |

His rationale was that `DESeq2`

is not able to leverage the fact that `A1`

`B1`

`C1`

are "expected" to be the "same"/similar and that the modeling is making an assumption of additive variance between the `lineage`

and `method`

where as batch-correction methods (such as `RUV`

or `ComBat`

) would not suffer from these problems, making them a more appropriate choice.

My questions are:

Is it correct that, given a design like the first matrix,

`DESeq2`

is not an appropriate choice when attempting to control for`lineage`

while comparing between`methods`

, and that other batch-correction-specific methods should be used?If

`DESeq2`

**is**an appropriate choice for this senario, is the design`~ lineage + method`

the most appropriate? Also, what is the best way to compare the strength of the overall effect of`lineage`

vs the overall effect of`method`

? I'm guessing this might involve extracting and comparing the model coefficients or perhaps a likelihood ratio test.

**26k**• written 5 months ago by wunderl •

**20**