Search
Question: DESeq2 Multi factor, multi time point comparison design formula
0
gravatar for arun.thirumaran
10 weeks ago by
arun.thirumaran0 wrote:

Hi,

I am new to RNA seq and seq data analysis.

I want to use DESeq2 to extract differential expression data for cells (from three donors) seeded on two different scaffolds (scaffold 1 and 2) across three different time points and compare them with an industry standard control (scaffold 3)  and positive control (TCP).

Experimental design is as follows,

  1. scaffold 1 (POC1.1) + cells in normal media (NM)
  2. scaffold 2 (POC1.3) + cells in normal media (NM)
  3. scaffold 3 (PLDLA) + cells in normal media (NM) 
  4. cells in tissue culture plastic in osteogenic induction media (OM) (control)

-Three time points day 7, day 14 and day 21

-all samples in biological triplicates (3 donors)

Total of 36 samples

I would like to find out deferentially expressed genes in cells cultured on POC 1.1 and POC 1.3 in normal media against PLDLA and osteogenic differentiated cells in tissue culture plastic (TCP).

My sample table looks like the following

 

sample name

file name

condition

Group

time

1

POC1.1

7_1_A.bam

NM

POC1.1

7

2

POC1.3

7_1_B.bam

NM

POC1.3

7

3

PLDLA

7_1_C.bam

NM

PLDLA

7

4

TCP

7_1_D.bam

OM

TCP

7

5

POC1.1

7_2_A.bam

NM

POC1.1

7

6

POC1.3

7_2_B.bam

NM

POC1.3

7

7

PLDLA

7_2_C.bam

NM

PLDLA

7

8

TCP

7_2_D.bam

OM

TCP

7

9

POC1.1

7_3_A.bam

NM

POC1.1

7

10

POC1.3

7_3_B.bam

NM

POC1.3

7

11

PLDLA

7_3_C.bam

NM

PLDLA

7

12

TCP

7_3_D.bam

OM

TCP

7

13

POC1.1

14_1_A.bam

NM

POC1.1

14

14

POC1.3

14_1_B.bam

NM

POC1.3

14

15

PLDLA

14_1_C.bam

NM

PLDLA

14

16

TCP

14_1_D.bam

OM

TCP

14

17

POC1.1

14_2_A.bam

NM

POC1.1

14

18

POC1.3

14_2_B.bam

NM

POC1.3

14

19

PLDLA

14_2_C.bam

NM

PLDLA

14

20

TCP

14_2_D.bam

OM

TCP

14

21

POC1.1

14_3_A.bam

NM

POC1.1

14

22

POC1.3

14_3_B.bam

NM

POC1.3

14

23

PLDLA

14_3_C.bam

NM

PLDLA

14

24

TCP

14_3_D.bam

OM

TCP

14

25

POC1.1

21_1_A.bam

NM

POC1.1

21

26

POC1.3

21_1_B.bam

NM

POC1.3

21

27

PLDLA

21_1_C.bam

NM

PLDLA

21

28

TCP

21_1_D.bam

OM

TCP

21

29

POC1.1

21_2_A.bam

NM

POC1.1

21

30

POC1.3

21_2_B.bam

NM

POC1.3

21

31

PLDLA

21_2_C.bam

NM

PLDLA

21

32

TCP

21_2_D.bam

OM

TCP

21

33

POC1.1

21_3_A.bam

NM

POC1.1

21

34

POC1.3

21_3_B.bam

NM

POC1.3

21

35

PLDLA

21_3_C.bam

NM

PLDLA

21

36

TCP

21_3_D.bam

OM

TCP

21

I want to test for

  1. POC1.1 Vs PLDLA for day 7, day 14 and day 21
  2. POC1.3 Vs PLDLA for day 7, day 14 and day 21
  3. PLDLA Vs TCP for day 7, day 14 and day 21
  4. POC1.1 Vs TCP for day 7, day 14 and day 21
  5. POC 1.3 Vs TCP for day 7, day 14 and day 21
  6. POC 1.1 Day 7 Vs Day 14 Vs Day 21
  7. POC 1.3 Day 7 Vs Day 14 Vs Day 21
  8. PLDLA Day 7 Vs Day 14 Vs Day 21
  9. TCP Day 7 Vs Day 14 Vs Day 21

Does this comparison make sense? Any suggestions are appreciated. 

Can anyone please help with design formulas for differential expression using DESeq2 for the above-mentioned comparisons.

 

Thank you very much in advance

Best Regards

Arun

ADD COMMENTlink modified 10 weeks ago by Michael Love19k • written 10 weeks ago by arun.thirumaran0

If you want to know whether your comparisons make sense, you need to tell us more about the biology of your experiment. What is a "scafffold" and what do you want to find out about it?

ADD REPLYlink written 10 weeks ago by Simon Anders3.5k

Hi Simon,

Thanks for the reply. 

I have edited the question above. Here is the brief experimental details.

The scaffold here is a hydroxyapatite dopped polymer. I am looking at osteoinductive properties of two different polymers (scaffold 1 and 2 (POC1.1 AND POC 1.3)) in comparison with an industry standard polymer (scaffold 3 (PLDLA))  and cells that are induced to differentiate in tissue culture plastic (TCP). 

I want to find out differential gene expression in scaffold 1 and 2 compared to the control material (scaffold 3) and positive control (TCP) across three time points. This will be followed by IPA analysis. Can I do this in one run in DESeq2 or Do I have to run each comparison separately. 

Please note cells are human origin and are in biological triplicates and I have single end sequences not paired end. 

 

Thank you in advance

Arun

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by arun.thirumaran0
0
gravatar for Michael Love
10 weeks ago by
Michael Love19k
United States
Michael Love19k wrote:

This statistical analysis question is beyond the amount of software support that I can provide at the moment. Maybe Simon or another of the support site answerers can help, but otherwise, since you have a quite complex experimental design, I'd recommend working with a local statistician to form the design and contrasts (note that anyone familiar with R design formula and linear model contrasts can help you, they don't need to know details about DESeq2 specifically, because we leverage the basic R design formula to compute coefficients). 

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by Michael Love19k

Thank You Michael, 

Unfortunately, I was not able to get any local help with R designs. Could you please comment on my analysis to compare between two polymers at three-time points? For example, I want to compare POC1.1 (treatment)  Vs PLDLA (control). 

my sample table looks like, 

 

Groups

time

1

POC1.1

07d

2

POC1.1

07d

3

POC1.1

07d

4

POC1.1

14d

5

POC1.1

14d

6

POC1.1

14d

7

POC1.1

21d

8

POC1.1

21d

9

POC1.1

21d

10

PLDLA

07d

11

PLDLA

07d

12

PLDLA

07d

13

PLDLA

14d

14

PLDLA

14d

15

PLDLA

14d

16

PLDLA

21d

17

PLDLA

21d

18

PLDLA

21d

I followed your guide on using contrasts to get results table and the script I used is below

counts <- read.table("POC1.1VPLDLA_UCSCtable.csv", header=TRUE, sep=",", row.names=1)

countdata <- as.matrix(counts)
head(countdata, 3)

groups <- factor(c("POC1.1", "POC1.1", "POC1.1", "POC1.1", "POC1.1", "POC1.1", "POC1.1", "POC1.1", "POC1.1", "PLDLA", "PLDLA", "PLDLA", "PLDLA", "PLDLA", "PLDLA", "PLDLA", "PLDLA", "PLDLA"))

time <- factor(c( "07d", "07d", "07d","14d", "14d", "14d", "21d", "21d", "21d", "07d", "07d", "07d","14d", "14d", "14d", "21d", "21d", "21d" ))

coldata <- data.frame(row.names=colnames(countdata), groupstime)

#generation of a DESeqDataSet from count matrix
dds = DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ groups + time + groups:time)
dds
#making factor levels (is this required?)
dds$groups <- relevel(dds$groups, ref = "PLDLA")
dds$time<- relevel(dds$time, ref = "07d")
#Interactions

dds$combined <- factor(paste0(dds$groups, dds$time))
design(dds) <- ~ combined
dds <- DESeq(dds)
resultsNames(dds)

# I get the following results
= [1] "Intercept"                    

"combined_PLDLA14d_vs_PLDLA07d"

"combined_PLDLA21d_vs_PLDLA07d"

"combined_POC1.107d_vs_PLDLA07d"

"combined_POC1.114d_vs_PLDLA07d"

"combined_POC1.121d_vs_PLDLA07d"

My questions here are, 

  1. Is my design formula ~groups+time+groups:time correct for this analysis?
  2. How can I model the design formula or use contrasts or names to get the results, for POC 1.1 Vs PLDLA on the following time points?
    1. d07
    2. d14
    3. d21
    4. d14 v d07
    5. d21 v d14
    6. d21 v d07

Any suggestion will be very helpful. 

Thank you in advance. 

Arun

ADD REPLYlink written 6 weeks ago by arun.thirumaran0

Sorry Arun but I don’t have time to provide this level of statistical support here. 

ADD REPLYlink written 6 weeks ago by Michael Love19k

Thank you Michael,

Could you please suggest for one of my time point, ie,

POC1.1 V PLDLA for day7 alone?

When I am reordering the results based on padj values for this singkle time points I am getting 0 genes 

> sum(res05$padj < 0.05, na.rm=TRUE)
[1] 0

But if i do p value instead of padj I am getting 231 genes

> sum(res05$pvalue < 0.05, na.rm=TRUE)
[1] 231

 

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by arun.thirumaran0

You simply would use the contrast argument to compare POC1.1 and PLDLA for day7 alone. You can compare any two levels of the combined variables using contrast.

You need to use the adjusted p-values not the p-values. Otherwise you will not control the false discovery rate. This is explained in the workflow, which you should read over:

https://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#multiple-testing

ADD REPLYlink written 6 weeks ago by Michael Love19k

Thank you very much for the reply Michael,

I will follow these instructions. 

Best regards

Arun

 

ADD REPLYlink written 6 weeks ago by arun.thirumaran0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 202 users visited in the last hour