Hi,
I have 50 samples form 25 individuals. These are paired samples (tumour and matched normal) and want to see difference between tumour and normal taking into account individuals.
my colData looks like this:
Sample Condition
1 N
1 T
2 N
2 T
3 N
3 T
.
.
25 N
25 T
dds_1 <- DESeqDataSetFromMatrix(countData = count_matrix, colData=colData, design = ~ Condition)
dds_2 <- DESeqDataSetFromMatrix(countData = count_matrix, colData=colData design = ~ Condition + Sample)
converting counts to integer mode
the design formula contains one or more numeric variables with integer values,
specifying a model with increasing fold change for higher values.
did you mean for this to be a factor? if so, first convert
this variable to a factor using the factor() function
the design formula contains one or more numeric variables that have mean or
standard deviation larger than 5 (an arbitrary threshold to trigger this message).
it is generally a good idea to center and scale numeric variables in the design
to improve GLM convergence.
Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
some variables in design formula are characters, converting to factors
# perform DEA
dea_1 <- DESeq(dds_1)
dea_2 <- DESeq(dds_2)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
1 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
result analysis
res_1 <- results(dea_1)
res_2 <- results(dea_2)
Results from the first run looks OK
> res_1
log2 fold change (MLE): Condition T vs N
Wald test p-value: Condition T vs N
DataFrame with 30161 rows and 6 columns
However, I am not sure if the second analysis run correctly as I can only see "Sample"
> res_2
log2 fold change (MLE): Sample
Wald test p-value: Sample
DataFrame with 30161 rows and 6 columns
Thank you!
Thank you Michael for help.
I have now changed the Sample and Condition to factor (as.factor), run it and got for the second analysis:
so it looks that the analysis compared sample 29 vs sample1. How to compare T vs N taking into account sample option? Thanks again for help!
This is covered in the documentation, see the vignette on designs with multiple factors.
Thank you for suggestions, I have read the vignette , but my design is not that complex like in the "Group-specific condition effects, individuals nested within groups" , I am not sure if I have to create another column like in the example "ind.n". My column "Sample" already has data regarding the individuals (1 individual -two samples). Maybe I read it wrongly. Your help will be very much appreciated! Thank you!