result analysis

Question

Paired analysis DESeq2 problems

0

Entering edit mode

georgina.fqw ▴ 10

@georginafqw-23788

Last seen 3.8 years ago

Hi,

I have 50 samples form 25 individuals. These are paired samples (tumour and matched normal) and want to see difference between tumour and normal taking into account individuals.

my colData looks like this:

Sample Condition
1 N
1 T
2 N
2 T
3 N
3 T
.
.
25 N 25 T

dds_1 <- DESeqDataSetFromMatrix(countData = count_matrix, colData=colData, design = ~ Condition)
dds_2 <- DESeqDataSetFromMatrix(countData = count_matrix, colData=colData design = ~ Condition + Sample)
converting counts to integer mode
  the design formula contains one or more numeric variables with integer values,
  specifying a model with increasing fold change for higher values.
  did you mean for this to be a factor? if so, first convert
  this variable to a factor using the factor() function
  the design formula contains one or more numeric variables that have mean or
  standard deviation larger than 5 (an arbitrary threshold to trigger this message).
  it is generally a good idea to center and scale numeric variables in the design
  to improve GLM convergence.
Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors

 # perform DEA
dea_1 <- DESeq(dds_1)
dea_2 <- DESeq(dds_2)
    estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
1 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

result analysis

res_1 <- results(dea_1)
res_2 <- results(dea_2)

Results from the first run looks OK

> res_1
log2 fold change (MLE): Condition T vs N 
Wald test p-value: Condition T vs N 
DataFrame with 30161 rows and 6 columns

However, I am not sure if the second analysis run correctly as I can only see "Sample"

> res_2
log2 fold change (MLE): Sample 
Wald test p-value: Sample 
DataFrame with 30161 rows and 6 columns

Thank you!

DESeq2 • 780 views

ADD COMMENT • link updated 4.2 years ago by Michael Love 43k • written 4.2 years ago by georgina.fqw ▴ 10

score 0 · Answer 1 · 2020-10-05

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 15 hours ago

United States

The note printed by DESeq() up above is saying clearly what to do in my opinion.

You’ve coded sample as a numeric which is not what you want. Sample 5 is not sample 2 + sample 3.

Did the note not make sense?

ADD COMMENT • link 4.2 years ago Michael Love 43k

0

Entering edit mode

Thank you Michael for help.

I have now changed the Sample and Condition to factor (as.factor), run it and got for the second analysis:

> res_2
log2 fold change (MLE): Sample 29 vs 1 
Wald test p-value: Sample 29 vs 1 
DataFrame with 30161 rows and 6 columns

so it looks that the analysis compared sample 29 vs sample1. How to compare T vs N taking into account sample option? Thanks again for help!

ADD REPLY • link 4.2 years ago georgina.fqw ▴ 10

0

Entering edit mode

This is covered in the documentation, see the vignette on designs with multiple factors.

ADD REPLY • link 4.2 years ago Michael Love 43k

0

Entering edit mode

Thank you for suggestions, I have read the vignette , but my design is not that complex like in the "Group-specific condition effects, individuals nested within groups" , I am not sure if I have to create another column like in the example "ind.n". My column "Sample" already has data regarding the individuals (1 individual -two samples). Maybe I read it wrongly. Your help will be very much appreciated! Thank you!

ADD REPLY • link 4.2 years ago georgina.fqw ▴ 10