Using DESeq2 to identify changing transcriptional activity in a metagenome
1
0
Entering edit mode
Tom • 0
@fb00ea2e
Last seen 10 months ago
Germany

Hello Bioconductor community,

I have used DESeq2 to test for differences in the overall transcriptional activity of species in a microbiome sample. However, i am not quite sure whether what I'm doing is correct. Thus, I would be very thankful for somebody confirming that this solution makes sense or pointing me to an error.

I am comparing two different biological conditions. For each condition, i have sequencing data from DNA and RNA libraries. This coldata-table describes the data:

sample condition type
l_1_dna low dna
l_2_dna low dna
l_1_rna low rna
l_2_rna low rna
h_1_dna high dna
h_2_dna high dna
h_1_rna high rna
h_2_rna high rna

I use metagenome assembled genomes representing the microbial species which are present in the sample. To generate count data, i map DNA and RNA reads to these genomes. The DNA and RNA libraries have vastly different sequencing depth and read length, but from what i understand about the DESeq2 normalization I assume that this won't be an issue. So i end up with a count table that looks like this (counts made up):

genome dna_l1 dna_l2 dna_h1 dna_h2 rna_l1 rna_l2 rna_h1 rna_h2
species_A 20000 20000 20000 2000 10000 10000 2000000 2000000
species_B 1000 1000 1000 1000 5000000 5000000 10000 10000
... ... ... ... ... ... ... ... ...

What i want to test is whether individual species change their transcriptional activity. So something like: "Is species A pumping out more mRNA per individual cell under condition high?". To that end, I'm using the likelihood ratio test.

dds_raw <- DESeqDataSetFromMatrix(countData=count, colData=col, design = ~ condition + type + condition:type)
dds <- DESeq(dds_raw, test="LRT", reduced= ~ condition + type)
results_lrt <- results(dds)


I then interpret low p-values to mean that the activity of the respective species has changed, since the differences between samples cannot be sufficiently explained without taking into account whether we are looking at genome- or transcriptome-abundance.
Is this the correct way of using DESeq2 to find out what i want to know? Is it even valid to use the package like that? It seems like I have strayed pretty far from the original purpose of the software..

Thank you for any input,
Tom

Metagenomics DESeq2 Metatranscriptomics • 340 views
2
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

The design looks fine to me, similar to other posts on the support site of testing ratios (e.g. ribosomal profiling etc.). I would just note that the NB distribution alone may not be a good match to metagenomics/metatranscriptomics, which some say requires a zero component in addition to the NB. If you go down the path of adding a zero component, with e.g. zinbwave, then it is possible to perform inference on the positive part of the count, but you may not have sufficient samples.

0
Entering edit mode

Thank you for your answer and especially for pointing me into the direction of zinbwave and the issue of zero inflation.

I will look into how i may take that into account. Regarding my initial question: i don't expect any zeros in the count matrix at all, given that reads are counted for whole genomes instead of individual genes. But it might be very relevant to the other analyses that i work on.

I have found publications about zero inflation in scRNA-seq and have started reading those. However, so far i couldn't find any discussion of zero inflation with regards to metatranscriptomics. If anybody can provide a link that would be appreciated. I assume this is out of the scope of this site, though, so i will mark the thread as answered.

Thanks again and have a nice day,

Tom

1
Entering edit mode

Oh I see. I didn't catch that. You may have no problem then with the setup above.