Hi,
I have an RNA-seq experiment results tailored to counts read starts. I have 3 wt and 4 mutant replicates (e.coli).
and the total number sof read starts per library are:
mut_1-both-strands 42094635
mut_2-both-strands 29605303
mut_3-both-strands 109837482
mut_4-both-strands 67809225
wt_1-both-strands. 47932727
wt_2-both-strands. 122863776
wt_3-both-strands. 80151076
1. Can I apply deseq2 on the level of individual genomic positions (not on genes)?
Does this require adjustments?
2. In a trial run that I made I got ~1000 positions that were enriched in the wt (0.01) and about as twice enriched in the mutant.
When I looked at two specific positions which I expected to see enriched in the wt I saw that indeed there are a lot more reads in the wt and consequencly high log2FC However only one is considered statistically significant and the other gets NA. I guess it is related to the variability, but I would very much appreciate your advic, as it looks to me that such a position should be discovered as enriched in the wt.
position 1
library #Number of read starting at this position (not normalized to library size)
mut_1-both-strands 507
mut_2-both-strands 223
mut_3-both-strands. 1335
mut_4-both-strands 1114
wt_1-both-strands 469180
wt_2-both-strands 719622
wt_3-both-strands 509314
Deseq2 results for this position:
4166257_plus 190295.968432297 9.0463953858598 9.22337015312029 0.573478869853235 15.7745923370724 NA NA
position 2
library #Number of read starting at this position (not normalized to library size)
mut_1-both-strands 0
mut_2-both-strands 0
mut_3-both-strands 0
mut_4-both-strands 0
wt_1-both-strands 23
wt_2-both-strands 78
wt_3-both-strands 127
Deseq2 results for this position:
4164567_plus 24.621061195164 8.04442634920043 10.4938677111955 1.56855170426622 5.12856944869641 2.91952195960602e-07 0.000124043973785603
Thanks a lot
Yael Altuvia
Thanks a lot for your answer. I think that I might have not explained my analysis clearly.
You wrote:
"Whereas at the basepair level, you have reads/fragments contributing to many rows, as determined by the read or fragment length, and so there is strong correlation of counts across nearby positions".
However, in my analysis every read is counted only once. I am checking where a read 1st position maps, and the count of that position is increased by 1. This read does not contribute to any other position in my analysis. In other words, the counts I have per position refer to "how many reads started at that position". In a way it is as if I have N genes each of size 1 and N is the chromosome length X 2 (for both strands) Of note: most of the positions do not have any count as no read starts there.
Under this description, Is it OK to apply DESeq to these counts to compare between two conditions?
thanks
yael
It seems you could *technically* use DESeq2 to these counts. See the vignette which discusses the NA's.
Given random fragmentation that occurs in RNA-seq though, the meaning of individual basepair results seems to me biologically meaningless.