Cross-species analysis - how to compute the mid parent value?
2
0
Entering edit mode
pl23 • 0
@4b83ad99
Last seen 13 hours ago
Canada

Hello,

This is crossposted from biostars as I was unsure which platform is right for this question.

I am looking to do a differential expression analysis for RNAaseq data in a hybrid by comparing the parental alleles to the in silico midparent value (for example like this paper). I was hoping to use modern DE tools such as DESeq2, EdgeR etc. that assume a negative binomial model. What I am confused about is how to incorporate normalization into computing the midparent values - for example if two corresponding parental genes have different lengths, how would I accommodate for this?

Suppose I am working with DESeq2 and my parents are A, B and my child is C with parent alleles Ca, Cb. The flow I was thinking of is as follows:

  1. Input samples from A,B, Ca, Cb into a DESeq model with gene lengths and compute normalization factors normalizationFactors(dds).
  2. Take normalized sample columns from the normalized counts matrix, say A_norm, B_norm and Ca_norm, Cb_norm
  3. Use round((A_norm+B_norm)/2)), Ca_norm, Cb_norm as inputs to a new DESeq model with constant size factors of 1 to account for the pre-normalized values.

The problem is that from what I understand DESeq does not directly use the normalized counts but rather incorporates the normalization factors into the computation of the means and dispersion values so I am not sure if this method makes sense. In particular if DESeq directly used the normalized counts K_ij/normalizationFactors(dds) in its modelling, then I belieeve providing the average of the normalized counts across the parents and a constant normalization matrix would have made statistical sense.

The only other method I can think of is to use a t-test on the log(TPM+1) values but I have 15 tissues and 3 replicates per tissue so in case I have an interaction between the tissue and genotype, I am afraid the normal distribution assumption of the test would not hold.

Any suggestions or feedback would be much appreciated, thank you!

DESeq2 limma edgeR DifferentialExpression • 384 views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 29 minutes ago
WEHI, Melbourne, Australia

This seems to me to be quite straightforward in any of the three packages that you have tagged (limma, edgeR, DESeq2). There is no need for normalized counts. Just undertake a standard DE analysis using the raw counts as usual and form the contrasts Ca - (A+B)/2 and Cb - (A+B)/2).

Having different gene lengths in the different samples would indeed be a complication, but can be incorporated by sample-gene-specific normalization factors. In edgeR, that would be done by providing an edgeR offset matrix, as is done by normalization packages like cqn or tximport.

ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 17 hours ago
United States

I'm not sure what is meant by a mid-parental value.

If you want to average the expression abundance between two samples you can use the TPM as calculated by a tool like Salmon.

With tximeta it would be:

assay(se, "abundance")
# or tximport:
txi$abundance
ADD COMMENT

Login before adding your answer.

Traffic: 607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6