Question

EdgeR: sRNA analysis check

0

Entering edit mode

Kenlee Nakasugi ▴ 30

@kenlee-nakasugi-6076

Last seen 9.6 years ago

Hi, I am hoping someone can comment on whether what I have done is correct as I do not have anyone here to confide in and am relatively new to this. Experiment: This is a sRNA deep sequencing experiment. I have two samples and two controls. The samples are a cellular sub compartment of the controls, and the controls are whole cell extract which also contain sequences from the sub compartment. The goal is to see whether there have been some transfer of sRNAs from the total extract into the sub-compartment. Due to the nature of the sample prep, and there will inevitably be some contamination of total extract sequences in the sub-compartment sequences. There are also some sRNA sequences that are natively present in the samples and are highly concentrated relative to the total extract (which also contain them). The focus is *not* on these sequences. The intention is to normalise first against the abundance of these sequences (create a 'base-level abundance'), and then input the new abundances into edgeR to identify sRNA sequences that have been enriched in the samples (i.e. have been transferred rather than contamination). Analysis design: The workflow is basically following: http://cgrlucb.wikispaces.com/edgeR +spring2013 1. Get the total counts of sample specific sRNA sequences in all samples and controls. 2. Normalise raw counts against the counts from step 1 for each sample/control 3. Run edgeR as described in http://cgrlucb.wikispaces.com/edgeR +spring2013 (The R code is at the end of this email). I am attaching some plots I generated. The BCV plot looks weird to me, but I'm not sure how to interpret. I am worried that I have done something wrong by first normalising against those sample specific sRNA sequences (getting the 'normCounts' variable below) Any advice greatly appreciated!! Cheers, Ken Here is the R code: ################# > sampleInfo=read.delim("sampleInfo", header =T, sep="\t") > group = sampleInfo[,2] > group [1] sample sample total total Levels: sample total > rawcounts <- read.delim("abun.tab", row.names="Sequence") > head(rawcounts) S1 S2 C1 C2 AAAAAAAAAATAGCCGTTGGACCC 1 2 1 3 AAAAAAAAAGAACCTAATGGATCGAACT 15 40 8 9 AAAAAAAAATGGCCGTTGGGAACC 1 1 2 2 AAAAAAAACCTGAATAACCCGAAA 5 8 129 34 AAAAAAAAGAACCTAATGGATCGAACT 44 86 15 23 AAAAAAAATCTGAATAACCCGAAA 1 1 38 4 > housekeep <- c(7966975,7644095,715816,724099) ## calculated these abundances outside of R > normCounts <- round(t(t(rawcounts)/housekeep)*mean(housekeep)) ## Not sure if this is right thing to do > filter <- means >=10 > normCountsfilt <- normCounts[filter,] > cds2 <- DGEList(normCountsfilt, group=group) > cds2 = calcNormFactors(cds2) > cds2 = estimateCommonDisp(cds2, verbose=T) Disp = 0.52689 , BCV = 0.7259 > cds2 = estimateTagwiseDisp(cds2) #BCVplot > plotBCV(cds2) #MeanVariance Plot > meanVarPlot <- plotMeanVar(cds2, show.raw.vars=TRUE, show.tagwise.vars=TRUE, show.binned.common.disp.vars=FALSE, show.ave.raw.vars=FALSE, NBline = TRUE , nbins = 100, pch = 16 , xlab ="Mean Expression (Log10 Scale)" , ylab = "VaPlot" ) > et <- exactTest(cds2, pair=levels(group)) > top <- topTags(et, n=nrow(cds2$counts))$table > de <- rownames(top[top$FDR<0.05,]) #SmearPlot > plotSmear(cds2 , de.tags=de) #VolcanoPlot > plot(top$logFC, -log10(top$PValue), pch=20, cex=.5, ylab="-log10(p-value)", xlab="logFC", col=as.numeric(rownames(top) %in% de)+1) ############################# -------------- next part -------------- A non-text attachment was scrubbed... Name: MeanVariancePlotfilterlowreads Type: image/png Size: 33391 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130820="" 4e2e0d70="" attachment.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: plotBCVfilterlowreads Type: image/png Size: 18635 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130820="" 4e2e0d70="" attachment-0001.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: SmearPlotfilterlowreads Type: image/png Size: 56590 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130820="" 4e2e0d70="" attachment-0002.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: VolcanoPlotfilterlowreads Type: image/png Size: 22130 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130820="" 4e2e0d70="" attachment-0003.png="">

Sequencing edgeR Sequencing edgeR • 1.2k views

ADD COMMENT • link updated 10.7 years ago by Gordon Smyth 50k • written 10.7 years ago by Kenlee Nakasugi ▴ 30

score 0 · Answer 1 · 2013-08-21

Dear Kenlee, There does look to be something seriously wrong, but I suspect the problem is with the counts in the first place rather than the analysis. Hard to diagnose by email. You seem to be relying third party web sites rather than with the documentation that comes with edgeR. Have you looked at the edgeR User's Guide? It recommends against normalized counts, see Section 2.5. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth > Date: Tue, 20 Aug 2013 17:49:02 +1000 > From: Kenlee Nakasugi <kenlee.nakasugi at="" sydney.edu.au=""> > To: bioconductor at r-project.org > Subject: [BioC] EdgeR: sRNA analysis check > > Hi, > > I am hoping someone can comment on whether what I have done is correct > as I do not have anyone here to confide in and am relatively new to > this. > > Experiment: > This is a sRNA deep sequencing experiment. > I have two samples and two controls. The samples are a cellular sub > compartment of the controls, and the controls are whole cell extract > which also contain sequences from the sub compartment. > The goal is to see whether there have been some transfer of sRNAs from > the total extract into the sub-compartment. > Due to the nature of the sample prep, and there will inevitably be some > contamination of total extract sequences in the sub-compartment > sequences. There are also some sRNA sequences that are natively present > in the samples and are highly concentrated relative to the total extract > (which also contain them). The focus is *not* on these sequences. The > intention is to normalise first against the abundance of these sequences > (create a 'base-level abundance'), and then input the new abundances > into edgeR to identify sRNA sequences that have been enriched in the > samples (i.e. have been transferred rather than contamination). > > Analysis design: > The workflow is basically following: http://cgrlucb.wikispaces.com/edgeR > +spring2013 > > 1. Get the total counts of sample specific sRNA sequences in all samples > and controls. > 2. Normalise raw counts against the counts from step 1 for each > sample/control > 3. Run edgeR as described in http://cgrlucb.wikispaces.com/edgeR > +spring2013 (The R code is at the end of this email). > > I am attaching some plots I generated. The BCV plot looks weird to me, > but I'm not sure how to interpret. > > I am worried that I have done something wrong by first normalising > against those sample specific sRNA sequences (getting the 'normCounts' > variable below) > > Any advice greatly appreciated!! > > Cheers, > Ken > > > > Here is the R code: > > ################# > >> sampleInfo=read.delim("sampleInfo", header =T, sep="\t") >> group = sampleInfo[,2] >> group > [1] sample sample total total > Levels: sample total > >> rawcounts <- read.delim("abun.tab", row.names="Sequence") >> head(rawcounts) > S1 S2 C1 C2 > AAAAAAAAAATAGCCGTTGGACCC 1 2 1 3 > AAAAAAAAAGAACCTAATGGATCGAACT 15 40 8 9 > AAAAAAAAATGGCCGTTGGGAACC 1 1 2 2 > AAAAAAAACCTGAATAACCCGAAA 5 8 129 34 > AAAAAAAAGAACCTAATGGATCGAACT 44 86 15 23 > AAAAAAAATCTGAATAACCCGAAA 1 1 38 4 > >> housekeep <- c(7966975,7644095,715816,724099) ## calculated these > abundances outside of R >> normCounts <- round(t(t(rawcounts)/housekeep)*mean(housekeep)) ## Not > sure if this is right thing to do > >> filter <- means >=10 >> normCountsfilt <- normCounts[filter,] > >> cds2 <- DGEList(normCountsfilt, group=group) >> cds2 = calcNormFactors(cds2) >> cds2 = estimateCommonDisp(cds2, verbose=T) > Disp = 0.52689 , BCV = 0.7259 >> cds2 = estimateTagwiseDisp(cds2) > > #BCVplot >> plotBCV(cds2) > #MeanVariance Plot >> meanVarPlot <- plotMeanVar(cds2, show.raw.vars=TRUE, > show.tagwise.vars=TRUE, show.binned.common.disp.vars=FALSE, > show.ave.raw.vars=FALSE, NBline = TRUE , nbins = 100, pch = 16 , xlab > ="Mean Expression (Log10 Scale)" , ylab = "VaPlot" ) > >> et <- exactTest(cds2, pair=levels(group)) >> top <- topTags(et, n=nrow(cds2$counts))$table >> de <- rownames(top[top$FDR<0.05,]) > > #SmearPlot >> plotSmear(cds2 , de.tags=de) > #VolcanoPlot >> plot(top$logFC, -log10(top$PValue), pch=20, cex=.5, > ylab="-log10(p-value)", xlab="logFC", col=as.numeric(rownames(top) %in% > de)+1) > > ############################# > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: MeanVariancePlotfilterlowreads > Type: image/png > Size: 33391 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201308="" 20="" 4e2e0d70="" attachment-0004.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: plotBCVfilterlowreads > Type: image/png > Size: 18635 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201308="" 20="" 4e2e0d70="" attachment-0005.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: SmearPlotfilterlowreads > Type: image/png > Size: 56590 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201308="" 20="" 4e2e0d70="" attachment-0006.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: VolcanoPlotfilterlowreads > Type: image/png > Size: 22130 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201308="" 20="" 4e2e0d70="" attachment-0007.png=""> ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}