Question: Test whether expression levels of two gene sets are significantly different
0
gravatar for Jon Bråte
2.9 years ago by
Jon Bråte160
Norway
Jon Bråte160 wrote:

Hi,

I have a set of non coding genes and a set of coding genes expressed across a number of samples. Based on the boxplots it is obvious that the lncRNAs are expressed lower than coding genes (log2 non-normallized counts). But what is the appropriate test for this kind of data, and is such a test implemented in e.g. DESeq2?

http://imgur.com/a/Xd9jT


Thanks, Jon

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Jon Bråte160
Answer: Test whether expression levels of two gene sets are significantly different
3
gravatar for Gordon Smyth
2.9 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

In my opinion, there is no rigorous statistical test available. Any obvious test, such a t-test or Wilcoxon test, assumes that the genes are independent whereas they are not. The dependence structure between the genes is unknown and, if it could be incorporated into the test, might have a dramatic effect on the computed p-value.

Since the result is so obvious, do you need a test?

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Gordon Smyth38k

Thanks for this point Gordon, I hadn't thought of that.

ADD REPLYlink written 2.9 years ago by chris86380
Answer: Test whether expression levels of two gene sets are significantly different
1
gravatar for Steve Lianoglou
2.9 years ago by
Denali
Steve Lianoglou12k wrote:

I don't know — whenever someone asks me to say something about the relative expression of different genes within the same samples, I never feel comfortable doing it ... in fact, I'll fight tooth and nail against it for the following reasons.

First, and perhaps most importantly: what does this comparison tell you? In the scenario, if you have fewer amounts (in TPM(!)) of lncRNAs than mRNAs, does that mean the lncRNAs have less impact/activity than mRNAs? Or if you have less mRNA for gene A than gene B, are "you" going to then assert that gene A has less functional influence on the cell than gene B?

If I offered to give you a "pinch" of uranium or a pound of salt, I bet you'd take the pound of salt.

Second: do we really think length normalization is all we need? Perhaps the results are out there (so please point me to those references if so) but are there convincing publications out there that show accuracy of TPMs to something like molar quantities of transcripts for a large number of transcripts?

 

If you want to know how the relative abundance of genes change across conditions (ie. ratio of gene A:gene B in condition vs condition 2), then fine: we're good ;-)

 

ADD COMMENTlink written 2.9 years ago by Steve Lianoglou12k
2

I don't disagree with you, but I did want to point out modern methods do much more than just length normalization to produce TPMs, e.g. almost all correct for size selection, and many correct for biases from degradation, random hexamer priming, amplification, and sequencing errors.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Michael Love25k

Thanks for pointing that out, Mike ... could you point towards a good rabbit hole to start digging into this topic "a new"?

I reckon starting with your alpine paper and digging through the refs from there eventually will get me to see the light?

 

ADD REPLYlink written 2.9 years ago by Steve Lianoglou12k

If you want to know more about the bias-correction methods, I think you might have to read the docs & source code of Kallisto, Salmon, et. al. They have both added lots of bias-correction features fairly recently, so I don't think there's a publication covering all or even most of them.

ADD REPLYlink written 2.9 years ago by Ryan C. Thompson7.4k

I can speak to the Salmon part. The latest bias correction methods for Salmon are described in the updated bioRxiv print:

http://biorxiv.org/content/early/2016/08/30/021592

This includes a fragment sequence bias correction similar to that described in alpine. The Salmon implementation then is preferred over alpine for producing transcript quantifications, as it is substantially faster and probabilistically allocates reads which map to multiple genomics locations.

ADD REPLYlink written 2.9 years ago by Michael Love25k

The alpine pre-print is now published in NBT, I wrote up a blog post describing the overall point: 

Also see the updated Salmon manuscript linked below as to how this is implemented in the latest version of Salmon.

ADD REPLYlink written 2.9 years ago by Michael Love25k
Answer: Test whether expression levels of two gene sets are significantly different
0
gravatar for chris86
2.9 years ago by
chris86380
UCL, United Kingdom
chris86380 wrote:

I'd take TPM values from RSEM or EXPRESS (accurate quantification of transcript abundance) then simply do a t-test or non parametric equivalent on the TPMs between the lncRNAs and mRNAs.

DESEQ does not deal with changes in transcript abundance, but in fold changes between conditions.

ADD COMMENTlink written 2.9 years ago by chris86380
1

Yes, I agree with Chris. I would do a non parametric test (e.g. wilcox.test) of TPMs from RSEM, eXpress, or the faster methods: Salmon, Sailfish, kallisto.

For others reading the thread who don't see why not to use counts here: the reason is that this question concerns testing average expression across *genes*, not expression across samples in different conditions.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Michael Love25k

Thanks! Could I also do a t-test using normalized counts from Deseq? These should take into account differences in sequencing depth, so should be similar to TPMs? The two gene sets are from the same sequencing libraries by the way.

ADD REPLYlink written 2.9 years ago by Jon Bråte160

No because they don't take into account length

ADD REPLYlink written 2.9 years ago by chris86380
Answer: Test whether expression levels of two gene sets are significantly different
0
gravatar for Jon Bråte
2.9 years ago by
Jon Bråte160
Norway
Jon Bråte160 wrote:

Hi,

Thank you all for valuable and interesting comments. I learned a lot from this.

Jon
 

ADD COMMENTlink written 2.9 years ago by Jon Bråte160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 264 users visited in the last hour