I am currently using deseq2 package to normalize my data and after that I will use it to calculate p-values between my conditions. I have a question about after normalization step. I normalized my RNA-seq data and I used this normalized data to find outliers in my data, I used PCA analysis. There are some samples that will be removed. But when checked the code I see that normalization step is inside the deseq function and I can not replace my new data (the data which is normalized previously but outliers removed) to the deseq function. Obviously it seems that, I will remove the outliers and renormalized my data to calculate p-values.
when I check some manuals I saw this:
"NOTE: DESeq2 doesn’t actually use normalized counts, rather it uses the raw counts and models the normalization inside the Generalized Linear Model (GLM). These normalized counts will be useful for downstream visualization of results, but cannot be used as input to DESeq2 or any other tools that peform differential expression analysis which use the negative binomial model." in here: https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html
I want to be sure that there is anything I can do to replace my data into deseq object. If there is a way to do it, can you explain to me please?
Hi Michael, thank you. I will chek it asap. But I think I couldn't explain my problem clearly, I wonder how can we detect and extract the outliers in deseq2 packages? Because it takes only normalized data and does not let me rearrange them.
How to detect outliers? We recommend looking at PCA plots in the workflow.
I don't follow what you are trying to do that isn't possible from the workflow.
okey, lets say if I check the PCA plots within or without (in matlab for example) the workflow and I detect some samples, not genes, are outliers, and ı want to remove the "normalized value" of these outliers before calculating pvalues with deseq package. ı wonder if I can change the normalized values within the deseq? I can extract them with "normalized = counts(ddssize, normalized = TRUE)" this code, but can I insert a changed normalized values into the ddssize object?
If you identify samples to remove you can do, e.g.:
okey, thank you very much. I will try it.