Hi, I am new to bioinformatics. I am learning how to process RNA-Seq data download from GEO. But people post different types of data on GEO, some post DESeq2 normalized data and others post TMM normalized data or CPM data or FPKM. I am confused because people were saying different things about how to start from those data.
My questions: 1. Is there any function that I could use to filter out low expressed genes in data starts from DESeq2 or TMM normalized data? or just draw a plot to make sure they did. what should I do if they didn't filter out low expressed genes? 2. When I try to filter out low expressed gene from FPKM or CPM data, which one is better to use, rowMeans or rowSums? 3. Is there anything wrong with my current RNA-Seq pipeline? or any advice?:
1) Start from FPKM: limma-trend
keep = rowMeans(v) >= 2
v = v[keep, ]
v = log(FPKM + 0.1)
fit = lmFit(v, design)
cont.matrix = makeContrasts(contrasts = Disease-Control, levels = design)
fit2 = contrasts.fit(fit, cont.matrix)
fit2 = eBayes(fit2, trend = TRUE)
2) Start from CPM: limma-trend
keep = rowSums(CPM > 1) >= 2
v = log2(data[keep,] + 0.1)
fit = lmFit(v, design)
cont.matrix = makeContrasts(contrasts = Disease-Control, levels = design)
fit2 = contrasts.fit(fit, cont.matrix)
fit2 = eBayes(fit2, trend = TRUE)
3) Start from DESeq2 normalized data: limma-voom
v = voom(data, design, plot=T)
fit = lmFit(v, design)
cont.matrix = makeContrasts(contrasts = Disease-Control, levels = design)
fit2 = contrasts.fit(fit, cont.matrix)
fit2 = eBayes(fit2)
4) Start from TMM normalized data: limma-voom How to filter out low expressed data
v = voom(data, design, plot=T)
fit = lmFit(v, design)
cont.matrix = makeContrasts(contrasts = Disease-Control, levels = design)
fit2 = contrasts.fit(fit, cont.matrix)
fit2 = eBayes(fit2)
5) Start from Raw count
dge = DGEList(counts = data)
keep =rowSums(cpm(dge)>1) >= 2
dge = calcNormFactors(dge)
v = voom(dge, design, plot=T)
fit = lmFit(v, design)
cont.matrix = makeContrasts(contrasts = Disease-Control, levels = design)
fit2 = contrasts.fit(fit, cont.matrix)
fit2 = eBayes(fit2)
I appreciate your valuable help!!