Hi edgeR Users,
I have employed edgeR to analyze RNAseq data before. Those data had two groups, one is control and one is treatment. However, now I need to investigate effect of chemical exposure on gene expression, so chemical exposure is a continuous variable, for example, from 30 ng/ml to 1000 ng/ml. I have several questions and hope to get help from you!
1. for the code, y <- DGEList(counts=x,group=group), I can just use y <- DGEList(counts=x) since no group in my data set, right?
2. there are 320 samples sequenced in four different time since I submitted those samples in four different month. after filtering lowly expressed genes, doing TMM normalization, I will use PCA to check batch effect related to the four different sample submission ; if obvious batch effect observed, I will use combat in sva package to adjust batch effect. according to my experience, there are some negative values for counted values after batch removal, do I need to add the absolute value of the minimal negative values to the whole data set before running edgeR?
3. For design matrix, I plan to use: design <- model.matrix(~ chemical_Concentration + age + race) , both chemical concentration and age are continuous variable, race is a categorical variable. is this design code correct?
4. then I plan to use robust =TRUE in disp <- estimateDisp(y, design, robust = TRUE) and fit <- glmFit(disp, design, robust = TRUE) as I am analyzing miRNA data, some miRNA would be highly expressed as outliers, so robust =TRUE could minimize the predominant influence from those outlying expression, is my understanding correct?
5. lastly, I would use lrt <- glmLRT(fit, coef = 2) and topTags(lrt, n=Inf, adjust.method = "BH", p.value = 0.05).
in my design, I am interested in the coefficient for chemical concentration (log2 transformed), so coef=2 is correct, right?
6. also I am not very sure about the workflow, so first filtering lowly expressed miRNA, then doing TMM, then adjust batch effect; or I need to do batch effect before doing filtering and TMM?
Thank you very much in advance!!
Ding
I agree that sva or RUVseq could be used to search for hidden batch effects, but ycding is only asking about a known batch factor (month) in Question 2. To handle a known factor like that, I think just including it in the design matrix would be simple and good.