I am very new to RNA-seq analysis and have a very specific question about using voomWithQualityWeights
For my analysis, I am using the expected counts generated from rsem tool to run my differential expression analysis.
I am using edgeR for TMM normalization and then voomWithQualityWeights before fitting my model. I have also removed genes with a zero counts from the expected count dataframe.
Overall I see ~600 genes differentially expressed between my two groups. I want to run hierarchal clustering on the 600 DE genes. It seems that using expected counts values for these 600 genes doesn't really produce two separate clusters, which I assume I should see since these genes are DE.
I would like to know, what variable(v$E, v$weights) from the voomWithQualityWeights actually goes into the lmFit, so that I could use same variable to run hierarchal clustering.
Apologies if this post is confusing, the following is my code.
## DataFrame with expected-counts from rsem. CountFrame <- (fread(CountMatrix, header=TRUE,sep=",",data.table=FALSE)) rownames(CountFrame) <- CountFrame[,1] CountFrame[,1] <- NULL keep <- CountFrame[rowSums(CountFrame==0) <= 25,] ## I have 86 samples CountFrame <- keep ## DataFrame describing the grouping along with other covariates(age,sex) phenoFrame <-read.table(phenoMatrix,header=T,sep="\t") group <-factor(phenoFrame$CATEGORY) sex <- phenoFrame$SEX age <-phenoFrame$AGE ##edgeR myDGElist <-DGEList(counts=CountFrame,group=group) myNormalized <- calcNormFactors(myDGElist) design <- model.matrix(~group) rownames(design) <- colnames(myNormalized) ## voom-limma with weight v <- voomWithQualityWeights(myNormalized, design=design, normalization="none", plot=FALSE) fit <- lmFit(v,design) fit <- eBayes(fit) top<-topTable(fit, coef=2, number=nrow(fit), adjust.method="BH", sort.by="P") ### ~600 significantly genes with adjusted p-value <0.05