Question

Trying to use a NanoString expression set with batch effects to do GSVA

0

Entering edit mode

Zhijie • 0

@a58cc9f4

Last seen 2.1 years ago

United States

Hi, everyone!

I have a NanoString expression set and I want to do GSVA. I followed the procedures of An approach for normalization and quality control for NanoString RNA expression data by Dr. Michael I. Love:

Quality control.
Identify housekeeping genes.
Pre-normalization visualization via RLE plot and PCA plot. I found batch effects in the expression set.
I used RUVSeq, DESeq2 and limma packages to remove batch effect and normalize the data. The codes mainly come from the Github provided by the article mentioned above. I changed slightly according to my data.

library(RUVSeq)
set <- newSeqExpressionSet(as.matrix(raw), phenoData = pData, featureData = fData)
cIdx <- rownames(set)[fData(set)$Class == "Housekeeping"]
set <- betweenLaneNormalization(set, which = "upper")
set <- RUVg(set, cIdx, k = 1)

library(DESeq2)
library(limma)
dds <- DESeqDataSetFromMatrix(counts(set), colData = pData(set), design = ~1)
rowData(dds) <- fData
dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds)
dds <- estimateDispersions(dds, fitType = "mean")
vsd <- varianceStabilizingTransformation(dds, blind = FALSE)
mat <- assay(vsd)
covars <- as.matrix(colData(dds)[,grep("W",colnames(colData(dds))),drop = FALSE])
mat <- removeBatchEffect(mat, covariates = covars)
vsd.after <- vsd
assay(vsd.after) <- mat

I used dataset in vsd.after to draw RLE and PCA plot again to check if there were still batch effects. It showed the batch effects were removed.

According to the protocol of GSVA package, the input of gsva() should be a normalized gene expression dataset. I think the dataset in vsd.after is the one after being removed batch effect and normalization. But I remembered the protocol of DESeq2 said that it is only good for visualization and could not be used for downstream analysis.

I also found there is a dataset in set, which is different from my raw expression set and not being normalized by DESeq2 and limma package. It could be extracted by the code set@assayData[["normalizedCounts"]]. I am not sure what it is. Is it the counts without batch effects? Could it be used for GSVA directly?

Could anyone tell me which one (vsd.after or set) could be used for GSVA? If they are not the proper data, how can I get the normalized dataset for GSVA?

Thank you!

GSVA Normalization BatchEffect GSVAdata NanoString • 1.2k views

ADD COMMENT • link 2.2 years ago Zhijie • 0