Question

Using DESeq2 to identify changing transcriptional activity in a metagenome

0

Entering edit mode

Tom • 0

@fb00ea2e

Last seen 3.4 years ago

Germany

Hello Bioconductor community,

I have used DESeq2 to test for differences in the overall transcriptional activity of species in a microbiome sample. However, i am not quite sure whether what I'm doing is correct. Thus, I would be very thankful for somebody confirming that this solution makes sense or pointing me to an error.

I am comparing two different biological conditions. For each condition, i have sequencing data from DNA and RNA libraries. This coldata-table describes the data:

sample	condition	type
l_1_dna	low	dna
l_2_dna	low	dna
l_1_rna	low	rna
l_2_rna	low	rna
h_1_dna	high	dna
h_2_dna	high	dna
h_1_rna	high	rna
h_2_rna	high	rna

I use metagenome assembled genomes representing the microbial species which are present in the sample. To generate count data, i map DNA and RNA reads to these genomes. The DNA and RNA libraries have vastly different sequencing depth and read length, but from what i understand about the DESeq2 normalization I assume that this won't be an issue. So i end up with a count table that looks like this (counts made up):

genome	dna_l1	dna_l2	dna_h1	dna_h2	rna_l1	rna_l2	rna_h1	rna_h2
species_A	20000	20000	20000	2000	10000	10000	2000000	2000000
species_B	1000	1000	1000	1000	5000000	5000000	10000	10000
...	...	...	...	...	...	...	...	...

What i want to test is whether individual species change their transcriptional activity. So something like: "Is species A pumping out more mRNA per individual cell under condition high?". To that end, I'm using the likelihood ratio test.

dds_raw <- DESeqDataSetFromMatrix(countData=count, colData=col, design = ~ condition + type + condition:type)
dds <- DESeq(dds_raw, test="LRT", reduced= ~ condition + type)
results_lrt <- results(dds)

I then interpret low p-values to mean that the activity of the respective species has changed, since the differences between samples cannot be sufficiently explained without taking into account whether we are looking at genome- or transcriptome-abundance.
Is this the correct way of using DESeq2 to find out what i want to know? Is it even valid to use the package like that? It seems like I have strayed pretty far from the original purpose of the software..

Thank you for any input,
Tom

Metagenomics DESeq2 Metatranscriptomics • 1.4k views

ADD COMMENT • link updated 3.6 years ago by Michael Love 43k • written 3.6 years ago by Tom • 0

score 2 · Accepted Answer · 2021-04-13

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

The design looks fine to me, similar to other posts on the support site of testing ratios (e.g. ribosomal profiling etc.). I would just note that the NB distribution alone may not be a good match to metagenomics/metatranscriptomics, which some say requires a zero component in addition to the NB. If you go down the path of adding a zero component, with e.g. zinbwave, then it is possible to perform inference on the positive part of the count, but you may not have sufficient samples.

ADD COMMENT • link 3.6 years ago Michael Love 43k

0

Entering edit mode

Thank you for your answer and especially for pointing me into the direction of zinbwave and the issue of zero inflation.

I will look into how i may take that into account. Regarding my initial question: i don't expect any zeros in the count matrix at all, given that reads are counted for whole genomes instead of individual genes. But it might be very relevant to the other analyses that i work on.

I have found publications about zero inflation in scRNA-seq and have started reading those. However, so far i couldn't find any discussion of zero inflation with regards to metatranscriptomics. If anybody can provide a link that would be appreciated. I assume this is out of the scope of this site, though, so i will mark the thread as answered.

Thanks again and have a nice day,

Tom

ADD REPLY • link 3.6 years ago Tom • 0

1

Entering edit mode

"...reads are counted for whole genomes instead of individual genes"

Oh I see. I didn't catch that. You may have no problem then with the setup above.

ADD REPLY • link 3.6 years ago Michael Love 43k