Question

Group comparison with one sample

0

Entering edit mode

kay2023 • 0

@49ef7f4f

Last seen 24 months ago

United States

I have to analyze an RNA-seq dataset. Goal is to compare two groups - say case and control. The issue is that there is only one sample per group. In a normal situation , I would not proceed with the analysis as n=1 is not really a "group", its not statistically justifiable, results cannot be generalized.

But this data is on cell line from a real patient with a disease. I will circle back with the investigator to see if its possible to generate more data. But in the event that its not possible to get more data, would voom-limma be a good tool to try (with all the caveats mentioned above). Thanks.

rnaseqcomp • 1.3k views

ADD COMMENT • link updated 2.0 years ago by Gordon Smyth 53k • written 2.0 years ago by kay2023 • 0

score 0 · Answer 1 · 2024-01-02

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 12 hours ago

WEHI, Melbourne, Australia

limma-voom cannot analyse data without replicates. The only option is to use edgeR with a preset dispersion parameter. See "What to do if you have no replicates" in the edgeR User's Guide:

https://bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

It is very easy. If counts is your matrix of read counts with two columns corresponding to case and control, then you can do a DE analysis by:

library(edgeR)
y <- DGEList(counts, group=c("control","case"))
y <- normLibSizes(y)
et <- exactTest(y, dispersion=0.2)
topTags(et)

Of course, the dispersion setting here is arbitrary and having replicates would be infinitely better. Nevertheless, the above analysis with any positive value for the dispersion is vastly better than assuming Poisson variation, as very many papers in the literature have done in similar situations.

The value of 0.2 that I have chosen is fairly conservative. Good quality RNA-seq data on a cell line should be less variable than that.

ADD COMMENT • link 2.0 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you so much for the feedback Gordon Smyth ! Since the results from one replicate may not be generalizable, I decided to calculate the ratio of case/control to generate a ratio equivalent of fold change, which I then used as ranking criteria for a GSEA analysis.
I first pre-filtered the RNA-seq to remove the lowly expressed genes, and for the remaining genes, input that along with the ratio into GSEA.

ADD REPLY • link 2.0 years ago kay2023 • 0

0

Entering edit mode

In my opinion, the edgeR code above will give a better ranking of genes in terms of likely biological significance than simply ranking by fold-change, depending on how you compute the fold-changes.

Also beware that pre-ranked GSEA is gives highly inflated signifance because it doesn't take into account inter-gene correlations.

ADD REPLY • link 2.0 years ago Gordon Smyth 53k