Relations between average expression and fold change
1
0
Entering edit mode
@lluis-revilla-sancho
Last seen 22 days ago
European Union

I have a dataset with a very complex set up, that I don't seem to be handling well: whatever I do I have a relationship between the fold change and the average expression of the genes.

I have 5 main variables:

  • Cell type: stem or differentiated cells
  • Group disease or control
  • Location: ileum or colon
  • Type: from pediatric or adult samples
  • Creation: old or new.

So far I decided to analyse as two cohorts the old and new samples, because they are from different experiment matrigels, there has been a couple of years in-between...

For the remaining variables I used a design experiment of interaction where I have each combination of the variables as a variable of the design: STEM_disease_ileum_pediatric, STEM_disease_ileum_adult, STEM_control_ileum_pediatric, ...

However, this ends up with comparisons like this one (done via limma): logFC vs AvgExpression

We can see that the higher the average expression is the bigger the logFC is, while I expected that the average expression would affect the fold change.

I tried changing the design to a more simple one with less interactions, I was recommended to normalize just the samples I use for each comparison but both resulted in worse results. I tried correcting using surrogate variables from sva package and it didn't work (despite finding 2 surrogate variables). The PCA did not show any clear batch effect, only that stem and diff cells have very different expression (separates them by first component, which explained the 36.5% of the variance).

I don't have more ideas to try, and suggestions about how to design/normalize the data are welcomed.

design AveExpr comparisons • 1.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

You don't say what kind of data these are, but what's wrong with using limma-trend?

ADD COMMENT
1
Entering edit mode

Ugh. Need more coffee. Sorry for the noise.

This is really a question about how you should be analyzing your data rather than how to use Bioconductor tools. Without having data in hand, I am not sure anybody can help you, and I am not sure that any advice will be helpful if given. At that point it's just conjecture.

ADD REPLY
0
Entering edit mode

Thanks! This is RNA-seq. You made me realize that I get this with plotSA. voom trend wrong Which I have never seen... do you have any suggestion here on what might be the error?

ADD REPLY
0
Entering edit mode

Well, for complex analyses I tend to filter based on the average logCPM of each gene. If you do a density plot of the rowMeans of the logCPM data, it usually is a bimodal distribution with a low point somewhere around zero. If you make the assumption that the genes to the left of the nadir are unexpressed, and those to the right are expressed, you can exclude based on that criterion. Which might help?

ADD REPLY
0
Entering edit mode

You might also try CQN to see if that helps.

ADD REPLY
0
Entering edit mode

Thanks for your help! At the end the problem was that I was using a design model with just one sample for a coefficient. Thank you very much for your advice (and sorry to answer so late, I just saw the notification).

ADD REPLY

Login before adding your answer.

Traffic: 938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6