I have TCGA RNAseq data comprising normal and tumor pairs for 100 patients. I am trying to identify DE genes between tumor and normal in a paired test manner. I am applying SVA to remove potential unknown batch effect. I would highly appreciate some help on these following questions!
1) The design matrix is ~ patient + Tumor to perform paired test. So should I include patient variable in the null model? or null model should be just an intercept?
2) How many SVs should I use in DE analysis? I am now using only the first two.
3) Should I explicitly tell svaseq to give me 2 SVs or I leave n.sv empty so that svaseq will estimate it by itself?
My current code is as below. Many thanks!
design <- model.matrix(~ patientFactor + tumorFactor)
mod0 <- model.matrix(~ 1, svaInfo)
svseq <- svaseq(as.matrix(norm_EDA[use,]), design, mod0)
SV1 <- svseq$sv[,1]
SV2 <- svseq$sv[,2]
dds_sva <- DESeq2::DESeqDataSetFromMatrix(countData =round(dat,0),
colData = cbind(info,SV1,SV2),
design = ~ SV1 + SV2 + patient + status)
dds_sva <- DESeq(dds_sva)