Question

The same statistical test for all multi omic features

0

Entering edit mode

chris86 ▴ 420

@chris86-8408

Last seen 4.4 years ago

UCL, United Kingdom

Hi

I am about to analyse a load of omic data from multiple platforms and it seems sensible for feature selection to use the same statistical test for all data types. Our response variable is continuous, but it is also discrete in another analysis. I was wondering what the best way of doing this was? I usually use limma for array and RNA-seq, but having to analyse these other data probably requires a different approach.

Our analysis includes flow cytometry, RNA-seq, microarray, proteomics, and metabolomics.

Thanks,

Chris

limma microarray rnaseq deseq2 • 1.0k views

ADD COMMENT • link 8.2 years ago chris86 ▴ 420

0

Entering edit mode

Saying you're going to use a generalized linear model is only slightly more specific than saying you're going to use "a statistical test".

ADD REPLY • link 8.2 years ago Ryan C. Thompson ★ 7.9k

score 0 · Answer 1 · 2016-02-17

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 2 hours ago

The city by the bay

Your question is pretty vague, so I'll give an equally vague answer. In general, if I have data sets of many different types, I start off by analyzing each one on its own merits. Only after I obtain results from each analysis do I think about how to integrate them, e.g., by overlapping differentially bound sites from ChIP-seq with promoters of DE genes or by intersecting DE lists from multiple transcriptomic experiments. In all cases, I pick what I think is the best analysis and/or statistical method for each data set; this ensures that I get reliable results for integration. I don't concern myself with the fact that the analyses might be different between the different data types. Why would that matter, as long as power is good and FDR control is maintained in each analysis?

ADD COMMENT • link 8.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for your reply Aaron. I was thinking of having a test that is comparable between platforms or technologies in so much as if I want to rank features for e.g. SVM or regression, and be able to cross compare p values. Using the best available method for each technology would be fine if this were not my objective. Perhaps I need to look into feature selection methods in more detail.

ADD REPLY • link 8.2 years ago chris86 ▴ 420

1

Entering edit mode

Well, using the same statistical test isn't a solution. For example, if RNA-seq is inherently more powerful than microarrays (e.g., due to lower technical noise, or differences in sample size), the use of the same test would always result in more rejections in the former compared to the latter. As a result, you'll always end up with more significant features in one technology compared to the other. And this would be entirely sensible; why would you want to handicap one analysis to make it "comparable" to another? Different technologies will inevitably have different power levels, no matter what test you use - if your downstream analyses can't handle that, then maybe it's time to change your downstream analyses.

ADD REPLY • link 8.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

I think that is a good comment, I will think about this.

There is another group I am in contact with that has a similar number of technologies and they seem to have done an ANOVA for everything.

ADD REPLY • link 8.2 years ago chris86 ▴ 420

Gordon Smyth · Answer 2 · 2016-02-18

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 51 minutes ago

WEHI, Melbourne, Australia

You mention four platforms. You're already using limma for the microarray and for the RNA-seq, and limma is also a common choice for proteomics. So what platform is causing you a problem?

I don't know what you mean by a "flow cytometry" platform, because flow cytometry is a method for sorting cells rather than a genomic profiling technology.

ADD COMMENT • link 8.2 years ago Gordon Smyth 50k

0

Entering edit mode

Thanks. Yes you have a good semantic point. The proteomics actually is autoantibody profiling, so the flow and autoantibodies will generate a kind of data where it is not sensible to apply limma. I can apply limma on the RNA-seq and array, that is fine. However if I wanted say to select the most relevant features across all technologies by using say 'p value' for further tests, then I could not do this in a fair way if I apply limma to 2 types of data and say, a regular t test to the other types. Perhaps I am being too demanding.

ADD REPLY • link updated 8.2 years ago by Gordon Smyth 50k • written 8.2 years ago by chris86 ▴ 420

0

Entering edit mode

Actually there is metabolomics as well.

ADD REPLY • link 8.2 years ago chris86 ▴ 420