deseq on a nucleotide resolution
2
0
Entering edit mode
yaelal • 0
@yaelal-11842
Last seen 7.8 years ago

Hi,

I have an RNA-seq experiment results tailored to counts read starts. I have 3 wt and 4 mutant replicates (e.coli).

and the total number sof read starts per library are:

mut_1-both-strands 42094635
mut_2-both-strands 29605303
mut_3-both-strands 109837482
mut_4-both-strands 67809225
wt_1-both-strands. 47932727
wt_2-both-strands. 122863776
wt_3-both-strands. 80151076

 

1. Can I apply deseq2 on the level of individual genomic positions (not on genes)?

    Does this require adjustments?

2. In a trial run that I made I got ~1000 positions that were enriched in the wt  (0.01) and about as twice enriched in the mutant.

When I looked at two specific positions which I expected to see enriched in the wt I saw that indeed there are a lot more reads in the wt and consequencly high log2FC  However only one is considered statistically significant and the other gets NA. I guess it is related to the variability, but I would very much appreciate your advic, as it looks to me that such a position should be discovered as enriched in the wt.    

position 1 

library                       #Number of read starting at this position (not normalized to library size)

mut_1-both-strands    507
mut_2-both-strands    223
mut_3-both-strands.    1335
mut_4-both-strands    1114
wt_1-both-strands     469180
wt_2-both-strands     719622
wt_3-both-strands    509314

Deseq2 results for this position:

4166257_plus 190295.968432297 9.0463953858598 9.22337015312029 0.573478869853235 15.7745923370724 NA NA

 

position 2 

library                       #Number of read starting at this position (not normalized to library size)

mut_1-both-strands    0
mut_2-both-strands    0
mut_3-both-strands    0
mut_4-both-strands    0
wt_1-both-strands     23
wt_2-both-strands     78
wt_3-both-strands     127

Deseq2 results for this position: 

4164567_plus 24.621061195164 8.04442634920043 10.4938677111955 1.56855170426622 5.12856944869641 2.91952195960602e-07 0.000124043973785603

 

Thanks a lot

Yael Altuvia

 

 

 

 

 

 

deseq2 • 825 views
ADD COMMENT
0
Entering edit mode

Thanks a lot for your answer.   I think that I might have not explained my analysis clearly.

You wrote:

"Whereas at the basepair level, you have reads/fragments contributing to many rows, as determined by the read or fragment length, and so there is strong correlation of counts across nearby positions".

However, in my analysis every read is counted only once. I am checking where a read 1st position maps, and the  count of that position is increased by  1. This read does not contribute to any other position in my analysis. In other words, the counts I have per position refer to "how many reads started at that position".    In a way it is as if I have N genes each of size 1 and N is the chromosome length X 2 (for both strands) Of note: most of the positions do not have any count as no read starts there.

Under this description, Is it OK to apply DESeq to these counts to compare between two conditions?

thanks

yael

 

 

 

ADD REPLY
1
Entering edit mode

It seems you could *technically* use DESeq2 to these counts. See the vignette which discusses the NA's.

Given random fragmentation that occurs in RNA-seq though, the meaning of individual basepair results seems to me biologically meaningless.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 23 hours ago
United States

It's outside of the normal pipeline for DESeq2, so without looking deep into the data myself, I can't recommend your approach at basepair level. Some of the steps which I would worry about are the size factor estimation and the dispersion prior estimation. In the typical pipeline, each gene is providing information into these two steps. While gene expression is to some degree correlated across genes, with the normal DESeq2 pipeline, you don't have the *same* reads or fragments contributing to counts from different genes. Whereas at the basepair level, you have reads/fragments contributing to many rows, as determined by the read or fragment length, and so there is strong correlation of counts across nearby positions.

If you want to look into using a count-based method with sliding windows I'd recommend reading Aaron Lun's work here:

http://nar.oxfordjournals.org/content/42/11/e95.full

http://nar.oxfordjournals.org/content/44/5/e45.full

And this workflow:

https://www.bioconductor.org/help/workflows/chipseqDB/

ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Another alternative would be to use the derfinder package.
 

ADD COMMENT
0
Entering edit mode

Thank you James for pointing this out! I should have mentioned this as well.

ADD REPLY

Login before adding your answer.

Traffic: 766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6