EdgeR: sRNA analysis check
1
0
Entering edit mode
@kenlee-nakasugi-6076
Last seen 9.6 years ago
Hi, I am hoping someone can comment on whether what I have done is correct as I do not have anyone here to confide in and am relatively new to this. Experiment: This is a sRNA deep sequencing experiment. I have two samples and two controls. The samples are a cellular sub compartment of the controls, and the controls are whole cell extract which also contain sequences from the sub compartment. The goal is to see whether there have been some transfer of sRNAs from the total extract into the sub-compartment. Due to the nature of the sample prep, and there will inevitably be some contamination of total extract sequences in the sub-compartment sequences. There are also some sRNA sequences that are natively present in the samples and are highly concentrated relative to the total extract (which also contain them). The focus is *not* on these sequences. The intention is to normalise first against the abundance of these sequences (create a 'base-level abundance'), and then input the new abundances into edgeR to identify sRNA sequences that have been enriched in the samples (i.e. have been transferred rather than contamination). Analysis design: The workflow is basically following: http://cgrlucb.wikispaces.com/edgeR +spring2013 1. Get the total counts of sample specific sRNA sequences in all samples and controls. 2. Normalise raw counts against the counts from step 1 for each sample/control 3. Run edgeR as described in http://cgrlucb.wikispaces.com/edgeR +spring2013 (The R code is at the end of this email). I am attaching some plots I generated. The BCV plot looks weird to me, but I'm not sure how to interpret. I am worried that I have done something wrong by first normalising against those sample specific sRNA sequences (getting the 'normCounts' variable below) Any advice greatly appreciated!! Cheers, Ken Here is the R code: ################# > sampleInfo=read.delim("sampleInfo", header =T, sep="\t") > group = sampleInfo[,2] > group [1] sample sample total total Levels: sample total > rawcounts <- read.delim("abun.tab", row.names="Sequence") > head(rawcounts) S1 S2 C1 C2 AAAAAAAAAATAGCCGTTGGACCC 1 2 1 3 AAAAAAAAAGAACCTAATGGATCGAACT 15 40 8 9 AAAAAAAAATGGCCGTTGGGAACC 1 1 2 2 AAAAAAAACCTGAATAACCCGAAA 5 8 129 34 AAAAAAAAGAACCTAATGGATCGAACT 44 86 15 23 AAAAAAAATCTGAATAACCCGAAA 1 1 38 4 > housekeep <- c(7966975,7644095,715816,724099) ## calculated these abundances outside of R > normCounts <- round(t(t(rawcounts)/housekeep)*mean(housekeep)) ## Not sure if this is right thing to do > filter <- means >=10 > normCountsfilt <- normCounts[filter,] > cds2 <- DGEList(normCountsfilt, group=group) > cds2 = calcNormFactors(cds2) > cds2 = estimateCommonDisp(cds2, verbose=T) Disp = 0.52689 , BCV = 0.7259 > cds2 = estimateTagwiseDisp(cds2) #BCVplot > plotBCV(cds2) #MeanVariance Plot > meanVarPlot <- plotMeanVar(cds2, show.raw.vars=TRUE, show.tagwise.vars=TRUE, show.binned.common.disp.vars=FALSE, show.ave.raw.vars=FALSE, NBline = TRUE , nbins = 100, pch = 16 , xlab ="Mean Expression (Log10 Scale)" , ylab = "VaPlot" ) > et <- exactTest(cds2, pair=levels(group)) > top <- topTags(et, n=nrow(cds2$counts))$table > de <- rownames(top[top$FDR<0.05,]) #SmearPlot > plotSmear(cds2 , de.tags=de) #VolcanoPlot > plot(top$logFC, -log10(top$PValue), pch=20, cex=.5, ylab="-log10(p-value)", xlab="logFC", col=as.numeric(rownames(top) %in% de)+1) ############################# -------------- next part -------------- A non-text attachment was scrubbed... Name: MeanVariancePlotfilterlowreads Type: image/png Size: 33391 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130820="" 4e2e0d70="" attachment.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: plotBCVfilterlowreads Type: image/png Size: 18635 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130820="" 4e2e0d70="" attachment-0001.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: SmearPlotfilterlowreads Type: image/png Size: 56590 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130820="" 4e2e0d70="" attachment-0002.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: VolcanoPlotfilterlowreads Type: image/png Size: 22130 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130820="" 4e2e0d70="" attachment-0003.png="">
Sequencing edgeR Sequencing edgeR • 1.2k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 51 minutes ago
WEHI, Melbourne, Australia
Dear Kenlee, There does look to be something seriously wrong, but I suspect the problem is with the counts in the first place rather than the analysis. Hard to diagnose by email. You seem to be relying third party web sites rather than with the documentation that comes with edgeR. Have you looked at the edgeR User's Guide? It recommends against normalized counts, see Section 2.5. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth > Date: Tue, 20 Aug 2013 17:49:02 +1000 > From: Kenlee Nakasugi <kenlee.nakasugi at="" sydney.edu.au=""> > To: bioconductor at r-project.org > Subject: [BioC] EdgeR: sRNA analysis check > > Hi, > > I am hoping someone can comment on whether what I have done is correct > as I do not have anyone here to confide in and am relatively new to > this. > > Experiment: > This is a sRNA deep sequencing experiment. > I have two samples and two controls. The samples are a cellular sub > compartment of the controls, and the controls are whole cell extract > which also contain sequences from the sub compartment. > The goal is to see whether there have been some transfer of sRNAs from > the total extract into the sub-compartment. > Due to the nature of the sample prep, and there will inevitably be some > contamination of total extract sequences in the sub-compartment > sequences. There are also some sRNA sequences that are natively present > in the samples and are highly concentrated relative to the total extract > (which also contain them). The focus is *not* on these sequences. The > intention is to normalise first against the abundance of these sequences > (create a 'base-level abundance'), and then input the new abundances > into edgeR to identify sRNA sequences that have been enriched in the > samples (i.e. have been transferred rather than contamination). > > Analysis design: > The workflow is basically following: http://cgrlucb.wikispaces.com/edgeR > +spring2013 > > 1. Get the total counts of sample specific sRNA sequences in all samples > and controls. > 2. Normalise raw counts against the counts from step 1 for each > sample/control > 3. Run edgeR as described in http://cgrlucb.wikispaces.com/edgeR > +spring2013 (The R code is at the end of this email). > > I am attaching some plots I generated. The BCV plot looks weird to me, > but I'm not sure how to interpret. > > I am worried that I have done something wrong by first normalising > against those sample specific sRNA sequences (getting the 'normCounts' > variable below) > > Any advice greatly appreciated!! > > Cheers, > Ken > > > > Here is the R code: > > ################# > >> sampleInfo=read.delim("sampleInfo", header =T, sep="\t") >> group = sampleInfo[,2] >> group > [1] sample sample total total > Levels: sample total > >> rawcounts <- read.delim("abun.tab", row.names="Sequence") >> head(rawcounts) > S1 S2 C1 C2 > AAAAAAAAAATAGCCGTTGGACCC 1 2 1 3 > AAAAAAAAAGAACCTAATGGATCGAACT 15 40 8 9 > AAAAAAAAATGGCCGTTGGGAACC 1 1 2 2 > AAAAAAAACCTGAATAACCCGAAA 5 8 129 34 > AAAAAAAAGAACCTAATGGATCGAACT 44 86 15 23 > AAAAAAAATCTGAATAACCCGAAA 1 1 38 4 > >> housekeep <- c(7966975,7644095,715816,724099) ## calculated these > abundances outside of R >> normCounts <- round(t(t(rawcounts)/housekeep)*mean(housekeep)) ## Not > sure if this is right thing to do > >> filter <- means >=10 >> normCountsfilt <- normCounts[filter,] > >> cds2 <- DGEList(normCountsfilt, group=group) >> cds2 = calcNormFactors(cds2) >> cds2 = estimateCommonDisp(cds2, verbose=T) > Disp = 0.52689 , BCV = 0.7259 >> cds2 = estimateTagwiseDisp(cds2) > > #BCVplot >> plotBCV(cds2) > #MeanVariance Plot >> meanVarPlot <- plotMeanVar(cds2, show.raw.vars=TRUE, > show.tagwise.vars=TRUE, show.binned.common.disp.vars=FALSE, > show.ave.raw.vars=FALSE, NBline = TRUE , nbins = 100, pch = 16 , xlab > ="Mean Expression (Log10 Scale)" , ylab = "VaPlot" ) > >> et <- exactTest(cds2, pair=levels(group)) >> top <- topTags(et, n=nrow(cds2$counts))$table >> de <- rownames(top[top$FDR<0.05,]) > > #SmearPlot >> plotSmear(cds2 , de.tags=de) > #VolcanoPlot >> plot(top$logFC, -log10(top$PValue), pch=20, cex=.5, > ylab="-log10(p-value)", xlab="logFC", col=as.numeric(rownames(top) %in% > de)+1) > > ############################# > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: MeanVariancePlotfilterlowreads > Type: image/png > Size: 33391 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201308="" 20="" 4e2e0d70="" attachment-0004.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: plotBCVfilterlowreads > Type: image/png > Size: 18635 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201308="" 20="" 4e2e0d70="" attachment-0005.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: SmearPlotfilterlowreads > Type: image/png > Size: 56590 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201308="" 20="" 4e2e0d70="" attachment-0006.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: VolcanoPlotfilterlowreads > Type: image/png > Size: 22130 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201308="" 20="" 4e2e0d70="" attachment-0007.png=""> ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6