Question

deseq on a nucleotide resolution

0

Entering edit mode

yaelal • 0

@yaelal-11842

Last seen 7.8 years ago

Hi,

I have an RNA-seq experiment results tailored to counts read starts. I have 3 wt and 4 mutant replicates (e.coli).

and the total number sof read starts per library are:

mut_1-both-strands 42094635
mut_2-both-strands 29605303
mut_3-both-strands 109837482
mut_4-both-strands 67809225
wt_1-both-strands. 47932727
wt_2-both-strands. 122863776
wt_3-both-strands. 80151076

1. Can I apply deseq2 on the level of individual genomic positions (not on genes)?

Does this require adjustments?

2. In a trial run that I made I got ~1000 positions that were enriched in the wt (0.01) and about as twice enriched in the mutant.

When I looked at two specific positions which I expected to see enriched in the wt I saw that indeed there are a lot more reads in the wt and consequencly high log2FC However only one is considered statistically significant and the other gets NA. I guess it is related to the variability, but I would very much appreciate your advic, as it looks to me that such a position should be discovered as enriched in the wt.

position 1

library #Number of read starting at this position (not normalized to library size)

mut_1-both-strands 507
mut_2-both-strands 223
mut_3-both-strands. 1335
mut_4-both-strands 1114
wt_1-both-strands 469180
wt_2-both-strands 719622
wt_3-both-strands 509314

Deseq2 results for this position:

4166257_plus 190295.968432297 9.0463953858598 9.22337015312029 0.573478869853235 15.7745923370724 NA NA

position 2

library #Number of read starting at this position (not normalized to library size)

mut_1-both-strands 0
mut_2-both-strands 0
mut_3-both-strands 0
mut_4-both-strands 0
wt_1-both-strands 23
wt_2-both-strands 78
wt_3-both-strands 127

Deseq2 results for this position:

4164567_plus 24.621061195164 8.04442634920043 10.4938677111955 1.56855170426622 5.12856944869641 2.91952195960602e-07 0.000124043973785603

Thanks a lot

Yael Altuvia

deseq2 • 825 views

ADD COMMENT • link updated 8.4 years ago by James W. MacDonald 68k • written 8.4 years ago by yaelal • 0

0

Entering edit mode

Thanks a lot for your answer. I think that I might have not explained my analysis clearly.

You wrote:

"Whereas at the basepair level, you have reads/fragments contributing to many rows, as determined by the read or fragment length, and so there is strong correlation of counts across nearby positions".

However, in my analysis every read is counted only once. I am checking where a read 1st position maps, and the count of that position is increased by 1. This read does not contribute to any other position in my analysis. In other words, the counts I have per position refer to "how many reads started at that position". In a way it is as if I have N genes each of size 1 and N is the chromosome length X 2 (for both strands) Of note: most of the positions do not have any count as no read starts there.

Under this description, Is it OK to apply DESeq to these counts to compare between two conditions?

thanks

yael

ADD REPLY • link 8.4 years ago yaelal • 0

1

Entering edit mode

It seems you could *technically* use DESeq2 to these counts. See the vignette which discusses the NA's.

Given random fragmentation that occurs in RNA-seq though, the meaning of individual basepair results seems to me biologically meaningless.

ADD REPLY • link 8.4 years ago Michael Love 43k

score 0 · Answer 1 · 2016-11-14

It's outside of the normal pipeline for DESeq2, so without looking deep into the data myself, I can't recommend your approach at basepair level. Some of the steps which I would worry about are the size factor estimation and the dispersion prior estimation. In the typical pipeline, each gene is providing information into these two steps. While gene expression is to some degree correlated across genes, with the normal DESeq2 pipeline, you don't have the *same* reads or fragments contributing to counts from different genes. Whereas at the basepair level, you have reads/fragments contributing to many rows, as determined by the read or fragment length, and so there is strong correlation of counts across nearby positions.

If you want to look into using a count-based method with sliding windows I'd recommend reading Aaron Lun's work here:

http://nar.oxfordjournals.org/content/42/11/e95.full

http://nar.oxfordjournals.org/content/44/5/e45.full

And this workflow:

https://www.bioconductor.org/help/workflows/chipseqDB/

score 0 · Answer 2 · 2016-11-14

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Another alternative would be to use the derfinder package.

ADD COMMENT • link 8.4 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you James for pointing this out! I should have mentioned this as well.

ADD REPLY • link 8.4 years ago Michael Love 43k