DESeq2 doubts about transformations + nbinomWaldtest warning
1
0
Entering edit mode
andreia • 0
@andreia-23745
Last seen 5 months ago
Portugal

Dear all, (or more specific Michael Love, if i get lucky) :)

Hope that you are doing well.

I am contacting you regarding the R package, DESeq2. I have been using this package for some years now, but only this week appeared a question when I was brainstorming with a biostatistic technician of my institute. :)

For exploratory analysis when I am doing a RNAseq analysis we have an option to use rlog or the vst transformations.

In your paper "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2" you have this statement:

while the VST is also effective at stabilizing variance, it does not directly take into account differences in size factors; and in datasets with large variation in sequencing depth (dynamic range of size factors ≳≳4) we observed undesirable artifacts in the performance of the VST.

But in the R vst function page:

The rlog is less sensitive to size factors, which can be an issue when size factors vary widely.

https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/varianceStabilizingTransformation

Reading these two statements I felt confused on which transformation method is effective for the size factors. Can you help me?

> dds<-DESeqDataSetFromMatrix(countData = table, colData = data, design=
> ~RIN+group)
> the design formula contains one or more numeric
> variables that have mean or   standard deviation larger than 5 (an
> arbitrary threshold to trigger this message).   it is generally a good
> idea to center and scale numeric variables in the design   to improve
> GLM convergence.


Is this a problem when I am doing the exploratory analysis? and what about the DEGs analysis?

Also after that i get:

> dds<-estimateSizeFactors(dds, controlGenes=index) dds<-DESeq(dds)
> using pre-existing size factors estimating dispersions gene-wise
> dispersion estimates mean-dispersion relationship final dispersion
> estimates fitting model and testing 1 rows did not converge in beta,
> labelled in mcols(object)$betaConv. Use larger maxit argument with > nbinomWaldTest  I searched in google and you already replied to someone https://github.com/mikelove/DESeq2/issues/3 ... I did: > dds<-estimateSizeFactors(dds, controlGenes=index) > dds<-estimateDispersions(dds) > dds<-nbinomWaldTest(dds, maxit=5000)  And I am still getting the same warning/error, I am asking if I keep increasing the number of the maxit but probably i will end it with the same error or can i remove the row that messes with this analysis step? Thanks i advance, Andreia ADD COMMENT 0 Entering edit mode @mikelove Last seen 3 days ago United States I prefer vst now. I think if the size factors had a range of 5-10x from smallest to largest then maybe I would prefer rlog, but generally I recommend vst. That comment about the rlog being less sensitive is supposed to be in line with the paper finding: when size factors vary a lot, rlog does better (it is less sensitive to the technical artifact of size factor differences, so good). The warning recommends that you scale the continuous values to improve convergence. You should follow the guidance of the warning. You can just do: dds$x <- dds$x / sd(dds$x)


To scale a variable by its SD.

You can also remove very lowly expressed genes to avoid some convergence problems:

keep <- rowSums(counts(dds) >= 10) >= z
dds <- dds[keep,]


Where you may pick a minimal number of samples z to be something like 3 or 5 samples for a small scale study.

0
Entering edit mode