Slightly different count plots from R and spreadsheet
1
0
Entering edit mode
boczniak767 ▴ 740
@maciej-jonczyk-3945
Last seen 2 days ago
Poland

Hi,

I'm analysing RNA-seq data in DESeq2. To make sure if data is loaded properly I compared count plot from ggplot and plot from exported values after accounting for size factor (I used the product of count and size-factor). Plot from exported values was done in libreoffice.

The plots is slightly different. Is it normal? Please find the plots below.

plot ftom ggplot

plot from spreadsheet

Here is the code that generated plot in R from DEseqDataSet


prb1 <- plotCounts(dds.leaf1.f, "Zm00001eb038060", intgroup = c("ln","timepoint"), returnData = TRUE)
ggplot(prb1, aes(x = timepoint, y = count, color = ln, group = ln)) + geom_point() + stat_summary(fun=mean, geom="line")
DESeq2 ggplot2 • 289 views
ADD COMMENT
0
Entering edit mode

I don't think it is the purpose of the support site to debug Excel behaviour. The issue might be how it combines replicates into a single value for this curve, but this is not on-topic here. plotCounts is the official method, or just counts(dds, normalized=TRUE) for more generic data extraction.

ADD REPLY
0
Entering edit mode

Thank you for feedback. As for spreadsheet plot - I have computed averages myself and used them for plotting.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

I'm gonna guess something went wrong in libreoffice.

Try also

with(prb1, plot(timepoint, count, col=as.integer(ln)))
ADD COMMENT
0
Entering edit mode

Thank you, the code you provided gives boxplots for each data-point for pooled samples?

ADD REPLY
0
Entering edit mode

I added col so you can see the different groups. I just want you to be able to see the raw data so you can infer which plot is correct.

ADD REPLY
0
Entering edit mode

Thank you, but now I have six black and six red boxes (3 black, 3 red, 3 black, 3 red) whereas for each timepoint I have two experimental variants - so on my original plots there are two lines. Or such coloring means that my data are inproperly organized?

Edit Okay, so I looked at prb1 object and I think the strange appearance of the boxplots is because program draws boxplot from all data for a given timepoint irrespective of ln value. And alternating coloring is because in my data samples are in different order than order of timepoints.

I hope such order of data is okay, count and coldata order is the same and I assume DESeq2 uses information from coldata.

Here is the plot

And this is prb1

               count   ln timepoint
la5.1.11  441.361468 a554         1
la5.2.11  138.574833 a554         2
la5.3.11   57.996669 a554         3
ls0.1.11    1.942328 s018         1
ls0.2.11   13.780888 s018         2
ls0.3.11    0.500000 s018         3
la5.1.21 1504.380872 a554         4
la5.2.21  712.860575 a554         5
la5.3.21 1138.077203 a554         6
ls0.1.21    0.500000 s018         4
ls0.2.21  724.976535 s018         5
ls0.3.21    0.500000 s018         6
la5.1.31  180.447510 a554         7
la5.2.31 2475.837204 a554         8
la5.3.31 1457.985065 a554         9
ls0.1.31    0.500000 s018         7
ls0.2.31    0.500000 s018         8
ls0.3.31    0.500000 s018         9
la5.1.41  137.997900 a554        10
la5.2.41 3059.768215 a554        11
la5.3.41   98.165640 a554        12
ls0.1.41    0.500000 s018        10
ls0.2.41    0.500000 s018        11
ls0.3.41    0.500000 s018        12
la5.1.12 4274.347987 a554         1
la5.2.12  312.819019 a554         2
la5.3.12 4993.082550 a554         3
ls0.1.12    0.500000 s018         1
ls0.2.12    1.715645 s018         2
ls0.3.12    0.500000 s018         3
la5.1.22 2828.562610 a554         4
la5.2.22 4252.970975 a554         5
la5.3.22 3343.736668 a554         6
ls0.1.22 1399.004144 s018         4
ls0.2.22    0.500000 s018         5
ls0.3.22    0.500000 s018         6
la5.1.32  238.625400 a554         7
la5.2.32  492.283816 a554         8
la5.3.32   72.458053 a554         9
ls0.1.32    8.359874 s018         7
ls0.2.32    0.500000 s018         8
ls0.3.32    0.500000 s018         9
la5.1.42 1899.530347 a554        10
la5.2.42 1142.974241 a554        11
la5.3.42  100.404578 a554        12
ls0.1.42    0.500000 s018        10
ls0.2.42    0.500000 s018        11
ls0.3.42    0.500000 s018        12
ADD REPLY
0
Entering edit mode

Change timepoint to an integer in the data frame:

as.numeric(as.character( <factor> ))

Then it should make points instead of boxes.

ADD REPLY
0
Entering edit mode

Thank you Michael Love for your patience and kind advice. So I assume the plots from R are valid. Best wishes.

ADD REPLY

Login before adding your answer.

Traffic: 437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6