Question

mroast/fry with gene weights for most or all genes

0

Entering edit mode

maltethodberg ▴ 170

@maltethodberg-9690

Last seen 7 days ago

Denmark

I’m experimenting with the ability of fry/mroast functions to include gene weights. I have two use cases:

1) Comparing the result of a previous experiment in mouse, with a current experiment in human: Basically I want to see if a previous observed response in mouse is observed in a similar human experiment. I can use Ensembl homologs to match results from the mouse data to the human data. There are multiple ways of setting the weights though, and I’m not sure which is best:

Set gene.weights to be the observed logFCs in mouse. This will result in most genes having a weight set.
Only set weights for genes that were DE (FDR < 0.05) in mouse.
Similar to above, but only setting basic directional weights (-1,1) instead of using the actual logFCs
Using predictive logFCs from predFCm, again most genes will end having weights set.

2) Enrichment of a gene set with gradual membership:

Let’s say I’m able to score each gene based on whether it belongs to some geneset. So basically I have a weight for each gene on some (more or less arbitrary) scale. How important is the absolute values of these weight to how mroast/fry uses them? Should they be scaled to the 0-1 interval, or can they go from say 1-1000?

I might also be interested in competitive version of the same test. Neither CAMERA nor ROMER accepts weights, leaving only barcodeplot. While this visualisation is nice, it’s not really useful for testing a large number sets. Is a possible to do a weighted competitive test in limma?

limma mroast fry • 909 views

ADD COMMENT • link updated 5.6 years ago by Gordon Smyth 50k • written 5.6 years ago by maltethodberg ▴ 170

score 0 · Answer 1 · 2018-09-16

Comparing mouse experiment to human

For ortholog mapping from mouse to human, my favourite is HUGO HCOP: https://www.genenames.org/cgi-bin/hcop
Using estimated logFCs for all genes is bad, because it will be dominated by large logFCs for low count genes, but all the other options should give good results. It depends on how many DE genes there are and on how you estimated the logFCs. If the mouse data gives many DE genes, then using TREAT to select DE genes may be even better. Obviously your second option, using logFCs as weights, carries more information than just using (-1,1). I've never tried the predFCm option, but in principle it should do well.

Gradual membership

Only the relative size of the weights is important. There is no need to scale them.

I'm not sure what gene weights would mean in a competitive test. How could you do a Wilcoxon test with weights? What would it mean? You could compute a correlation between the weights and the contrasts, but this a hypothetical question.