Can DESeq2 be applied on gene coverages, rather than read counts?
1
1
Entering edit mode
Rui ▴ 10
@15ef12dc
Last seen 3.2 years ago
United Kingdom

Hello everyone,

I have 18 soil metagenomes sampled from three different conditions (6 samples each), for which I have assembled. I’d like to do a comparative study of the genes that could potentially be expressed in these samples. What I did was that I first estimated the presence of KOs using KEGG’s kofam database on the contigs, and I also got the coverage of each KO in my samples. Next, I calculated the total coverage of each KO (i.e. by summing the coverage of the same KOs on different contigs) in every individual sample. I am wondering if I can use these coverage values as input for DESeq2 as opposed to read counts mapped on each gene?

An example of my input data would be

sample A1 A2 A3 A4 A5 A6 B1 B2 B3 B4 B5 B6 C1 C2 C3 C4 C5 C6
K00001 36.3978 0 22.4602 74.0771 37.8024 27.4386 23.3456 41.8246 25.8649 51.336 27.8538 22.8763 26.036 49.5105 24.3947 34.8723 52.0351 33.0193
K00003 747.666 828.64 509.787 950.148 993.473 777.476 717.928 732.745 633.716 894.339 821.97 874.446 850.249 1226.15 1222.42 1011.75 1087.76 1050.81
K00004 10.3978 7.96612 0 9.25023 19.4712 7.21569 5.16952 11.9648 11.1345 29.6271 34.3835 23.7866 26.3798 34.4165 15.7022 29.2394 22.2378 16.9822
K00005 48.0351 66.329 14.0175 70.9348 12.1143 99.1202 46.5701 91.1882 61.639 65.1754 93.5285 135.657 290.069 315.314 279.189 211.366 318.911 200.309
K00009 8.69086 0 9.73114 0 19.1342 8.89908 0 0 0 0 0 0 15.148 27.9347 18.6585 22.022 26.684 31.8226
K00010 126.77 119.827 172.367 203.431 297.6 155.051 40.8878 98.7166 24.2119 63.5848 141.801 83.471 74.2136 144.451 76.1991 36.0218 72.0287 61.0815
K00012 1530.33 1601.31 1312.18 2208.2 2488.36 1669.64 992.256 1011.06 924.257 1240.32 1182.85 915.699 719.04 1180.61 1060.9 695.96 1201.49 1044.33
K00013 256.634 363.695 150.823 575.562 442.416 416.417 312.265 409.185 363.236 512.297 424.308 323.398 219.852 616.609 573.806 436.281 548.883 483.997

..... the list continues

Thank you for any insights!

Rui

Coverage DESeq2 • 1.3k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 5 days ago
United States

It's not ideal. Basically, DESeq2 is attempting to model the data given that it consists of observations of reads which have a certain variance property. Normalized counts, or coverage data, do not have that property exactly and so the method won't perform as well as it would with the proper input. I've seen this when people provide normalized counts to DESeq2 instead of original counts -- the method does not have the ability to associate higher variance on the estimated LFC from low counts, when those low counts have actually been scaled up to an artificially high value.

ADD COMMENT
0
Entering edit mode

Hello Dr.Love,

Thank you for your prompt reply! I suspect that DESeq2 would work if I can get the actual number of reads mapped to each gene, and use them as input?

Also, would you recommend some other tools to deal with coverage data, for example, I know there is a tool called MaAsLin2 that uses general linear models, but I am not sure if it’s appropriate for coverage data either. Or is it a bad idea, after all, to deal with coverage for gene abundance comparisons? Thanks again!

ADD REPLY
1
Entering edit mode

Yes, the recommended input to DESeq2 is the number of reads mapping to each gene.

I would use limma-voom if you have some normalized data, as it will assess the empirical relationship between mean and variance.

ADD REPLY
0
Entering edit mode

Thanks for the recommendation, I will get it a try!

ADD REPLY

Login before adding your answer.

Traffic: 549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6