Question

getting sample size for RNAseq experiment

0

Entering edit mode

roser.navarro • 0

@rosernavarro-9660

Last seen 6.5 years ago

Dear all,

I've read some papers and vignettes related to calculate the sample size for RNAseq experiments.

Iv'e also tried to use some tools as RNAseqPS, rnaseqpower, sspa but all of them work with 2 groups.

I have a problem because in our experimental design we are comparing more than 2 groups.

We take samples from patients in 5 different time points.

From each patient we take 1 biopsy and 1 liquid sample. From the biopsy we take 3 different samples (different parts of the biopsy which should have different transcriptomic profiles). So, in summary: from each patient we have 4 samples (3 from the biopsy and 1 from the liquid).

We want to compare samples within each time point and also samples between different time points (see the scheme).

T1 T2 T3 T4 T5 (time point)

S1 S1 S1 S1 S1 (samples type 1)

S2 S2 S2 S2 S2

S3 S3 S3 S3 S3

S4 S4 S4 S4 S4

N? N? N? N? N? (sample size for each group)

Differential expression analysis will be performed in horizontal and in vertical (between time points and between type of samples).

I've seen that above approaches compare only 2 groups (A vs B).

How can we deal with this problem?

Could we calculate sample size for 2 groups and multiply the N by 5? Or should we increase the sample size?

Because we are not comparing A vs B. We compare A vs B, A vs C, A vs D, A vs E, B vs C.... etc. So I guess this is a more complex problem (FDR) that I don't know how to solve.

Any help/advice will be welcome.

Best regard and thanks in advance

RNAseqPS: A Web Tool for Estimating Sample Size and Power for RNAseq Experiment

www.ncbi.nlm.nih.gov

Sample size and power determination is the first step in the experimental design of a successful study. Sample size and power calculation is required for applications for National Institutes of Health (NIH) funding. Sample size and power calculation is ...

rnaseqpower sspa sample size sequencing complex-design • 2.2k views

ADD COMMENT • link updated 8.2 years ago by m.van_iterson ▴ 20 • written 8.2 years ago by roser.navarro • 0

score 0 · Answer 1 · 2016-02-16

0

Entering edit mode

m.van_iterson ▴ 20

@mvan_iterson-7879

Last seen 6.5 years ago

Netherlands

Dear Roser,

The SSPA package can do power and sample size analysis for RNAseq data using test-statistics other then t-test statistics for two-group comparison. For example, see the vignette for a chi-square example or our paper for F-statistics example.

It's not completely clear to me which test you want to perform an anova-like for any difference among sample at a specific time-point?

Furthermore, do you want to investigate the power of your test with increasing number of technical replicates (if I understand it correctly the three biopsies per sample) or more biological replicates samples? The latter is in general recommended. Or is it related to the time points?

Kind regards,

Maarten

ADD COMMENT • link 8.2 years ago m.van_iterson ▴ 20

0

Entering edit mode

Dear Maarten,

First of all thanks for your answer :-)

We want to compare more than 2 groups. We would like to compare 5 groups/conditions.

Our goal is to calculate the number of patients per condition that we should to include in our study.

We don't have technical replicated because the 3 biopsy samples are taken from different areas of the biopsy, that's why we expect different transcriptomic profiles.

Complexity of our design it's not included in any of the vignettes I've been reading. That's why I don't know if we can calculate N comparing two groups and then multiple the resulting N by 5 or we should apply multiple comparisons and adjust by FDR.

ADD REPLY • link 8.2 years ago roser.navarro • 0

score 0 · Answer 2 · 2016-02-17

0

Entering edit mode

m.van_iterson ▴ 20

@mvan_iterson-7879

Last seen 6.5 years ago

Netherlands

Could you exactly formulate the hypotheses you want to test, e.g. for biopsy I is there any difference between the samples at time point t1 compared to time point t2 or so?

Did you looked at the limma userguide section 9.6 Time Course Experiments?

Cheers,

Maarten

ADD COMMENT • link 8.2 years ago m.van_iterson ▴ 20

0

Entering edit mode

Dear Maarten,

I hope this scheme helps you :-)

Stage 1(time point 1)

Patient:

Biopsy:
- Zone A: RNA-Seq
- Zone B: RNA-Seq
- Zone C: RNA-Seq
Liquid:
- Cells: RNA-Seq

And the same scheme for 5 different stages.

I want to know the number of patients per stage that should be included in the study. Our goal is to detect differential expressed genes:

1 - within each stage

ZoneA vs ZoneB on stage 1, ZoneA vs ZoneC on stage 1, ZoneA vs Cells on stage1, ZoneA vs ZoneB on stage 2 and so on.

2 - between each stage (time points)

ZoneA stage 1 vs ZoneA stage 2, Zone A stage 1 vs ZoneA stage 3, and son on.

ZoneB stage 1 vs Zone B stage 2, Zone B stage 1 ZoneB stage 3, an so on.

Power of the test will set to 0.9, alpha to 0.05 and sequencing depth to 5M reads.

I've read in different bioconductor Vingnettes how you can get the number of samples that you need to include in the study. But, these approaches only compare two groups, for example control vs cancer samples.

Our design is more complex because we have 5 time points or stages, and inside each stage we have 4 types of samples. That's why we don't know the number of patients we should to include in our design to have enough power to detect DEG for all the comparisons that I have already told you.

Thank u a lot

ADD REPLY • link 8.2 years ago roser.navarro • 0

score 0 · Answer 3 · 2016-02-21

Dear Roser,

You are planning to do many tests (some of which are two-sample tests) each test will have it's own power. If I understand correctly you want to do a power and sample size analysis for all tests at once? I do not know a proper way to do this. I can image that you do a power and sample size analysis for those hypotheses were you expect the least effect(number of deg and effect-sizes) because these will require the largest number of samples and then make your design balance as it is now with the same number of samples in each combination time/biopsy which should yield sufficient power of the other tests as well.

The sspa package requires pilot data to perform sample size analysis. So you should look at any of the other packages for sample size analyses. Since, you want to do so many tests, maybe you should use a conservative significance level e.g. 0.001 i.s.o., 0.05 for FDR.

Good luck,

Maarten