Hi,
I'm doing transcription level analysis with my RNA-seq data while seeing nature protocol(https://www.nature.com/articles/nprot.2016.095).
I did alignment, assembly and quantification expressed genes and transcripts with HISAT2 and Stringtie. However, I found my sample sizes was small (n < 4 per group) as mentioned in nature protocol.
Note that Ballgown’s statistical test is a standard linear model based comparison. For small sample sizes (n < 4 per group) it is often better to perform regularization. This can be done using the limma33 package in Bioconductor.
Therefore, I did regularization on Limma, but I'm not sure if this is correct. So, I want to ask everyone to confirm my command below.
#use data in nature protocal
#create ballgown instance
pheno_data = read.csv("chrX_data/geuvadis_phenodata.csv")
bg_chrX = ballgown(dataDir = "ballgown", samplePattern = "ERR", pData=pheno_data)
bg_chrX_filt = subset(bg_chrX,"rowVars(texpr(bg_chrX)) >1",genomesubset=TRUE)
#make design for limma
design <- model.matrix(~0+sex+population, data=pheno_data)
colnames(design) <- gsub("sex", "", colnames(design))
#regularization on limma
v <- voom(texpr(bg_chrX_filt), design, plot=TRUE)
vfit <- lmFit(v, design)
efit <- eBayes(vfit)
plotSA(efit, main="Final model: Mean-variance trend")
contr <- makeContrasts(female - male, levels = colnames(coef(efit)))
tmp <- contrasts.fit(efit, contr)
tmp <- eBayes(tmp)
top.table <- topTable(tmp, sort.by = "P", n = Inf)
#see DEGs
head(top.table, 10)
I referred to those sites; https://github.com/alyssafrazee/ballgown
https://ucdavis-bioinformatics-training.github.io/2018-June-RNA-Seq-Workshop/thursday/DE.html
Thanks!
Thanks Gordon Smyth! I try to use limma based on counts.