Question

Cross-species analysis - how to compute the mid parent value?

0

Entering edit mode

pl23 • 0

@4b83ad99

Last seen 21 months ago

Canada

Hello,

This is crossposted from biostars as I was unsure which platform is right for this question.

I am looking to do a differential expression analysis for RNAaseq data in a hybrid by comparing the parental alleles to the in silico midparent value (for example like this paper). I was hoping to use modern DE tools such as DESeq2, EdgeR etc. that assume a negative binomial model. What I am confused about is how to incorporate normalization into computing the midparent values - for example if two corresponding parental genes have different lengths, how would I accommodate for this?

Suppose I am working with DESeq2 and my parents are A, B and my child is C with parent alleles Ca, Cb. The flow I was thinking of is as follows:

Input samples from A,B, Ca, Cb into a DESeq model with gene lengths and compute normalization factors normalizationFactors(dds).
Take normalized sample columns from the normalized counts matrix, say A_norm, B_norm and Ca_norm, Cb_norm
Use round((A_norm+B_norm)/2)), Ca_norm, Cb_norm as inputs to a new DESeq model with constant size factors of 1 to account for the pre-normalized values.

The problem is that from what I understand DESeq does not directly use the normalized counts but rather incorporates the normalization factors into the computation of the means and dispersion values so I am not sure if this method makes sense. In particular if DESeq directly used the normalized counts K_ij/normalizationFactors(dds) in its modelling, then I belieeve providing the average of the normalized counts across the parents and a constant normalization matrix would have made statistical sense.

The only other method I can think of is to use a t-test on the log(TPM+1) values but I have 15 tissues and 3 replicates per tissue so in case I have an interaction between the tissue and genotype, I am afraid the normal distribution assumption of the test would not hold.

Any suggestions or feedback would be much appreciated, thank you!

DESeq2 limma edgeR DifferentialExpression • 1.1k views

ADD COMMENT • link written 2.0 years ago by pl23 • 0

score 1 · Answer 1 · 2024-03-08

This seems to me to be quite straightforward in any of the three packages that you have tagged (limma, edgeR, DESeq2). There is no need for normalized counts. Just undertake a standard DE analysis using the raw counts as usual and form the contrasts Ca - (A+B)/2 and Cb - (A+B)/2).

Having different gene lengths in the different samples would indeed be a complication, but can be incorporated by sample-gene-specific normalization factors. In edgeR, that would be done by providing an edgeR offset matrix, as is done by normalization packages like cqn or tximport.

score 0 · Answer 2 · 2024-03-08

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

I'm not sure what is meant by a mid-parental value.

If you want to average the expression abundance between two samples you can use the TPM as calculated by a tool like Salmon.

With tximeta it would be:

assay(se, "abundance")
# or tximport:
txi$abundance

ADD COMMENT • link 23 months ago Michael Love 43k