Question: Relations between average expression and fold change
gravatar for Lluís Revilla Sancho
22 days ago by
European Union
Lluís Revilla Sancho530 wrote:

I have a dataset with a very complex set up, that I don't seem to be handling well: whatever I do I have a relationship between the fold change and the average expression of the genes.

I have 5 main variables:

  • Cell type: stem or differentiated cells
  • Group disease or control
  • Location: ileum or colon
  • Type: from pediatric or adult samples
  • Creation: old or new.

So far I decided to analyse as two cohorts the old and new samples, because they are from different experiment matrigels, there has been a couple of years in-between...

For the remaining variables I used a design experiment of interaction where I have each combination of the variables as a variable of the design: STEM_disease_ileum_pediatric, STEM_disease_ileum_adult, STEM_control_ileum_pediatric, ...

However, this ends up with comparisons like this one (done via limma): logFC vs AvgExpression

We can see that the higher the average expression is the bigger the logFC is, while I expected that the average expression would affect the fold change.

I tried changing the design to a more simple one with less interactions, I was recommended to normalize just the samples I use for each comparison but both resulted in worse results. I tried correcting using surrogate variables from sva package and it didn't work (despite finding 2 surrogate variables). The PCA did not show any clear batch effect, only that stem and diff cells have very different expression (separates them by first component, which explained the 36.5% of the variance).

I don't have more ideas to try, and suggestions about how to design/normalize the data are welcomed.

design comparisons aveexpr • 74 views
ADD COMMENTlink modified 22 days ago by James W. MacDonald52k • written 22 days ago by Lluís Revilla Sancho530
Answer: Relations between average expression and fold change
gravatar for James W. MacDonald
22 days ago by
United States
James W. MacDonald52k wrote:

You don't say what kind of data these are, but what's wrong with using limma-trend?

ADD COMMENTlink written 22 days ago by James W. MacDonald52k

Ugh. Need more coffee. Sorry for the noise.

This is really a question about how you should be analyzing your data rather than how to use Bioconductor tools. Without having data in hand, I am not sure anybody can help you, and I am not sure that any advice will be helpful if given. At that point it's just conjecture.

ADD REPLYlink modified 22 days ago • written 22 days ago by James W. MacDonald52k

Thanks! This is RNA-seq. You made me realize that I get this with plotSA. voom trend wrong Which I have never seen... do you have any suggestion here on what might be the error?

ADD REPLYlink written 22 days ago by Lluís Revilla Sancho530

Well, for complex analyses I tend to filter based on the average logCPM of each gene. If you do a density plot of the rowMeans of the logCPM data, it usually is a bimodal distribution with a low point somewhere around zero. If you make the assumption that the genes to the left of the nadir are unexpressed, and those to the right are expressed, you can exclude based on that criterion. Which might help?

ADD REPLYlink written 22 days ago by James W. MacDonald52k

You might also try CQN to see if that helps.

ADD REPLYlink written 21 days ago by James W. MacDonald52k

Thanks for your help! At the end the problem was that I was using a design model with just one sample for a coefficient. Thank you very much for your advice (and sorry to answer so late, I just saw the notification).

ADD REPLYlink written 17 days ago by Lluís Revilla Sancho530
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 147 users visited in the last hour