Dear all
I am a working with DESeq2 (version 1.22.2) on R (version 3.5.3) to analyze my data from RNA sequencing.
I did some tests last week and I had no trouble but this week, for the real analysis, I have a problem.
Coldata table looks like :
> target_OMvsYM
label condition taille
1 OM_E1 OM 409
2 OM_E2 OM 519
3 OM_E3 OM 906
4 OM_E4 OM 1836
5 OM_E5 OM 1922
6 OM_E6 OM 2063
7 YM_E1 YM 603
8 YM_E2 YM 834
9 YM_E3 YM 1094
10 YM_E4 YM 1313
11 YM_E5 YM 2643
I have 6 replicates for OM group and 5 replicates for YM group. I am using a second factor which is numeric (taille). I checked that colData labels and colnames of countData are the same
> all(rownames(target_OMvsYM$label) == colnames(counts_ICM_OMvsYM))
[1] TRUE
Then I run my analysis:
dds_ICM_OMvsYM <- DESeqDataSetFromMatrix(countData = counts_ICM_OMvsYM,
colData = target_OMvsYM,
design= ~ condition + taille)
dds_ICM_OMvsYM$condition <- relevel(dds_ICM_OMvsYM$condition, ref = "YM")
dds_ICM_OMvsYM<-DESeq(dds_ICM_OMvsYM)
My problem is at this step.
Last week, when I used resultsname, the good name was "condition_OM_vs_YM"
but this week I have this resultsname :
> resultsNames(dds_ICM_OMvsYM)
[1] "Intercept" "condition1" "taille"
Why don't I have the same name as last week? Is my DESeq2 analysis good ? I tried to find the answer but I do not know where is the problem.
Thank you for your help !
Emilie
Thnak you for your answer ! I just quit R and re-run. I didn't change my code. The name is the same again.
Yes I have a message about the numeric covariate but it is just saying that it would be preferable to transform it into a factor. As I can't do that, I am still using a numeric covariate.
Could I have the reason why you advise me to use a center and scale numeric covariate ? I hadn't thought about it before and I wonder how it would improve the analysis.
Thank you !
Emilie
After you’ve run DESeq() and the names are not what you’ve expected, can you print dds$condition and target$condition to see what those variables look like?
Re: centering and scaling continuous covariates, this is just standard practice for models that have to be solved with gradient ascent as it helps with model fitting. GLM is such a model. Here’s a link I found just now with a google search:
https://stats.stackexchange.com/a/29820
Thank you a lot for your answer. I did not know about scaling and centering of covariates.
I am wondering about something I found this week. I also have metabolomic, lipidomic and proteomic data which gives similar count datasets to those obtained with the RNAseq. I found a paper which compare different methods for statistical analysis. They compare DESeq2 whith other analysis and it seems to be working pretty well. What do you think about using DESeq2 for this kind of data ?
Thank you
Emilie
If you have count data without missing values it’s probably a decent approach but don’t hold me to it given I don’t know what your particular dataset looks like. If you have missing/uncertainty in measurements I think the stats should take into account the missingness/uncertainty. One way to approach this is multiple imputation. We looked into this recently with isoform level differential testing, extending the SAMseq model:
https://academic.oup.com/nar/article/47/18/e105/5542870
I actually have a lot of missing values in my LC-MS/MS data which are represented by the value 0. The value 0 is not clear : maybe the compound is not present or maybe it is not detected by the spectrometer.
I will do more research about it and SAMseq model.
Thank you again
Emilie
Just to be clear — we do not have a ready to use software in place for these other assays. It’s more of a suggestion of an approach we used in a different area (transcriptomics).
Might be worth looking around in the BiocViews for these areas for packages specially designed.