Question

DESeq2 resultsname problem

0

Entering edit mode

emilie.derisoud • 0

@emiliederisoud-22403

Last seen 4.5 years ago

Dear all

I am a working with DESeq2 (version 1.22.2) on R (version 3.5.3) to analyze my data from RNA sequencing.

I did some tests last week and I had no trouble but this week, for the real analysis, I have a problem.

Coldata table looks like :

> target_OMvsYM
   label condition taille
1  OM_E1        OM    409
2  OM_E2        OM    519
3  OM_E3        OM    906
4  OM_E4        OM   1836
5  OM_E5        OM   1922
6  OM_E6        OM   2063
7  YM_E1        YM    603
8  YM_E2        YM    834
9  YM_E3        YM   1094
10 YM_E4        YM   1313
11 YM_E5        YM   2643

I have 6 replicates for OM group and 5 replicates for YM group. I am using a second factor which is numeric (taille). I checked that colData labels and colnames of countData are the same

> all(rownames(target_OMvsYM$label) == colnames(counts_ICM_OMvsYM))
[1] TRUE

Then I run my analysis:

dds_ICM_OMvsYM <- DESeqDataSetFromMatrix(countData = counts_ICM_OMvsYM,
                              colData = target_OMvsYM,
                              design= ~ condition + taille)

dds_ICM_OMvsYM$condition <- relevel(dds_ICM_OMvsYM$condition, ref = "YM")

dds_ICM_OMvsYM<-DESeq(dds_ICM_OMvsYM)

My problem is at this step. Last week, when I used resultsname, the good name was "condition_OM_vs_YM" but this week I have this resultsname :

> resultsNames(dds_ICM_OMvsYM)
[1] "Intercept"  "condition1" "taille"

Why don't I have the same name as last week? Is my DESeq2 analysis good ? I tried to find the answer but I do not know where is the problem.

Thank you for your help !

Emilie

deseq2 • 557 views

ADD COMMENT • link updated 4.5 years ago by Michael Love 42k • written 4.5 years ago by emilie.derisoud • 0

score 0 · Answer 1 · 2019-11-20

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 5 hours ago

United States

We haven't changed DESeq2 and how it takes in the colData. Can you check over your code and maybe re-run in a fresh session?

Note that I'd recommend to center and scale numeric covariates that are so large. The latest version of DESeq2 actually prints a message about this.

ADD COMMENT • link 4.5 years ago Michael Love 42k

0

Entering edit mode

Thnak you for your answer ! I just quit R and re-run. I didn't change my code. The name is the same again.

Yes I have a message about the numeric covariate but it is just saying that it would be preferable to transform it into a factor. As I can't do that, I am still using a numeric covariate.

Could I have the reason why you advise me to use a center and scale numeric covariate ? I hadn't thought about it before and I wonder how it would improve the analysis.

Thank you !

Emilie

ADD REPLY • link 4.5 years ago emilie.derisoud • 0

0

Entering edit mode

After you’ve run DESeq() and the names are not what you’ve expected, can you print dds$condition and target$condition to see what those variables look like?

Re: centering and scaling continuous covariates, this is just standard practice for models that have to be solved with gradient ascent as it helps with model fitting. GLM is such a model. Here’s a link I found just now with a google search:

https://stats.stackexchange.com/a/29820

ADD REPLY • link 4.5 years ago Michael Love 42k

0

Entering edit mode

Thank you a lot for your answer. I did not know about scaling and centering of covariates.

I am wondering about something I found this week. I also have metabolomic, lipidomic and proteomic data which gives similar count datasets to those obtained with the RNAseq. I found a paper which compare different methods for statistical analysis. They compare DESeq2 whith other analysis and it seems to be working pretty well. What do you think about using DESeq2 for this kind of data ?

Thank you

Emilie

ADD REPLY • link 4.5 years ago emilie.derisoud • 0

0

Entering edit mode

If you have count data without missing values it’s probably a decent approach but don’t hold me to it given I don’t know what your particular dataset looks like. If you have missing/uncertainty in measurements I think the stats should take into account the missingness/uncertainty. One way to approach this is multiple imputation. We looked into this recently with isoform level differential testing, extending the SAMseq model:

https://academic.oup.com/nar/article/47/18/e105/5542870

ADD REPLY • link 4.5 years ago Michael Love 42k

0

Entering edit mode

I actually have a lot of missing values in my LC-MS/MS data which are represented by the value 0. The value 0 is not clear : maybe the compound is not present or maybe it is not detected by the spectrometer.

I will do more research about it and SAMseq model.

Thank you again

Emilie

ADD REPLY • link 4.5 years ago emilie.derisoud • 0

0

Entering edit mode

Just to be clear — we do not have a ready to use software in place for these other assays. It’s more of a suggestion of an approach we used in a different area (transcriptomics).

Might be worth looking around in the BiocViews for these areas for packages specially designed.

ADD REPLY • link 4.5 years ago Michael Love 42k