EdgeR: artifacts on BCV plot

0

Entering edit mode

Adriaan Sticker ▴ 90

@adriaan-sticker-6368

Last seen 9.7 years ago

Hi all, I made some BCV plots of my data after the tagewise estimation step. I notice sometimes that I gave genes with identical very low BCV values .It appears as a horizontal line below the rest of my data but it is always above zero. I put an example in attachement. They disapear when I higher the cutoff of my filter (cpm(counts)>1 to cpm(counts)>2) but then I also lose a fraction of my genes. I wonder how I should interpret these values? What are they exactly. My guess would be that they are very low counts and due the discretness of count data, their bcv is zero? If I dont up my filter cutoff and thus leave them in the data, how harmfull are they? Can they influence much the estimation of BCV of the other data? (I use prior.df = 20) I can see the trended dispersion line moving a bit when I up my filter for the lower counts. In attachement the BCV plot with the artifacts (cpm(counts)>1) and a BCV plot without them (cpm(counts)>2) Best regards Adriaan Sticker -------------- next part -------------- A non-text attachment was scrubbed... Name: bcv1.png Type: image/png Size: 29952 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140130="" e7f7e47a="" attachment.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: bcv2.png Type: image/png Size: 28371 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140130="" e7f7e47a="" attachment-0001.png="">

• 1.9k views

ADD COMMENT • link updated 10.3 years ago by Ryan C. Thompson ★ 7.9k • written 10.3 years ago by Adriaan Sticker ▴ 90

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 9 months ago

Scripps Research, La Jolla, CA

Hi Adriaan, Perhaps the best plan is to inspect these genes and their counts directly to see if there are any obvious commonalities about them. You can access the tagwise dispersions in the DGEList object using dge$tagwise.dispersions. It's not clear exactly how many of the genes show this artifact, but try finding the 10 or so genes with the lowest dispersions and look at their counts to see if there is anything amiss. For what it's worth though, it doesn't look like these genes are having a noticeable effect on your dispersion trend, and I don't think you have much to worry about. -Ryan On 1/30/14, 10:23 AM, Adriaan Sticker wrote: > Hi all, > > I made some BCV plots of my data after the tagewise estimation step. I > notice sometimes that I gave genes with identical very low BCV values .It > appears as a horizontal line below the rest of my data but it is always > above zero. I put an example in attachement. They disapear when I higher > the cutoff of my filter (cpm(counts)>1 to cpm(counts)>2) but then I also > lose a fraction of my genes. > > I wonder how I should interpret these values? What are they exactly. My > guess would be that they are very low counts and due the discretness of > count data, their bcv is zero? > If I dont up my filter cutoff and thus leave them in the data, how harmfull > are they? Can they influence much the estimation of BCV of the other data? > (I use prior.df = 20) I can see the trended dispersion line moving a bit > when I up my filter for the lower counts. > > In attachement the BCV plot with the artifacts (cpm(counts)>1) and a BCV > plot without them (cpm(counts)>2) > > Best regards > Adriaan Sticker > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 10.3 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Hi Thanks for your input. I checked manually the counts of the lowest BCV values (see below) And I see nothing strange. Except the fact that the counts are all at the low side. So I think I will keep them in. Is it correct to think that the reason they appear on 1 horizontal line is because of the discreteness of the counts? Greetings Adriaan > ids = (y.tagwise$tagwise.dispersion %in% sort(y.tagwise$tagwise.dispersion)[1:10]) > counts.filtered[ids,] X1 X2 X8 X7 X14 X13 X20 X3 X4 X9 X10 X15 X16 X21 X22 X5 X6 X11 X12 X17 X18 X23 X24 ENSG00000111215 18 13 16 15 20 14 13 11 17 42 13 30 35 17 26 12 14 35 15 50 20 13 27 ENSG00000134061 7 11 11 9 8 4 14 0 11 24 21 10 23 6 31 4 11 27 17 15 9 8 30 ENSG00000162927 26 28 15 25 19 18 19 23 24 77 18 44 40 19 27 21 17 75 22 67 18 21 37 ENSG00000197748 19 14 15 28 23 25 18 12 17 75 29 46 53 21 38 10 14 75 22 86 18 18 30 ENSG00000204822 8 8 9 12 12 12 12 4 7 32 7 18 16 14 17 3 8 30 13 27 9 7 32 ENSG00000213070 16 13 6 8 16 12 14 8 12 27 6 24 22 17 24 7 12 22 11 39 10 17 33 ENSG00000228343 13 14 7 9 17 15 20 5 11 22 6 30 39 19 33 4 12 26 8 63 22 16 45 ENSG00000229359 25 20 12 14 14 19 20 9 15 32 9 19 20 10 22 8 16 28 15 33 13 9 27 ENSG00000233597 21 18 14 14 18 16 14 10 18 41 13 23 25 18 21 12 13 35 15 42 12 15 31 ENSG00000248932 27 25 19 27 15 21 19 18 18 71 19 28 29 17 31 19 13 68 24 49 12 15 30 2014-01-30 Ryan <rct@thompsonclan.org>: > Hi Adriaan, > > Perhaps the best plan is to inspect these genes and their counts directly > to see if there are any obvious commonalities about them. You can access > the tagwise dispersions in the DGEList object using > dge$tagwise.dispersions. It's not clear exactly how many of the genes show > this artifact, but try finding the 10 or so genes with the lowest > dispersions and look at their counts to see if there is anything amiss. > > For what it's worth though, it doesn't look like these genes are having a > noticeable effect on your dispersion trend, and I don't think you have much > to worry about. > > -Ryan > > > On 1/30/14, 10:23 AM, Adriaan Sticker wrote: > > Hi all, > > I made some BCV plots of my data after the tagewise estimation step. I > notice sometimes that I gave genes with identical very low BCV values .It > appears as a horizontal line below the rest of my data but it is always > above zero. I put an example in attachement. They disapear when I higher > the cutoff of my filter (cpm(counts)>1 to cpm(counts)>2) but then I also > lose a fraction of my genes. > > I wonder how I should interpret these values? What are they exactly. My > guess would be that they are very low counts and due the discretness of > count data, their bcv is zero? > If I dont up my filter cutoff and thus leave them in the data, how harmfull > are they? Can they influence much the estimation of BCV of the other data? > (I use prior.df = 20) I can see the trended dispersion line moving a bit > when I up my filter for the lower counts. > > In attachement the BCV plot with the artifacts (cpm(counts)>1) and a BCV > plot without them (cpm(counts)>2) > > Best regards > Adriaan Sticker > > > > _______________________________________________ > Bioconductor mailing listBioconductor@r-project.orghttps://stat.ethz .ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]

ADD REPLY • link 10.3 years ago Adriaan Sticker ▴ 90

Login before adding your answer.