Question: RNA-seq data analysis with two species using DESeq2
0
gravatar for fl
9 months ago by
fl0
fl0 wrote:

Hello,

I'm working on a gene expression analysis with two non-model species. I followed the same field collecting and RNA preservation protocol for all samples and generated 24 libraries that were pooled and sequenced across three lanes. I'm following the steps below for differential expression analysis:

1. Perform structural annotation with draft genome assemblies (results show similar completeness and quality of annotation)
2. Use transcripts from gene models to build a kallisto index for each species and quantify
3. Identify one-to-one orthologs between the two species
4. Create copies of kallisto's "abundance.tsv" files, replace the "target_id" field with ortholog identifiers across samples, and discard quantification data from genes without orthologs
5. Import kallisto files with tximport and "txOut = TRUE"; the genome annotation does not include isoforms so I don't have a transcript-to-gene map
6. Run DESeq2 pipeline using a single factor (behavior) with four levels. One species has three levels and the other has one level.

Here is the result of a PCA with distances calculated from VTS data. PC1 clearly separates the two species and explains an unusually high percentage of the variance (80%). 

https://ibb.co/cA51UT

I was wondering if there are major problems with the approach that I'm using. I rarely see studies using DESeq2 with more than one species, and I would appreciate any thoughts on this type of analysis or if it's appropriate to use DESeq2 for this purpose.

deseq2 • 268 views
ADD COMMENTlink modified 9 months ago by Simon Anders3.5k • written 9 months ago by fl0

Why are you surprised that species explains the bulk of the variance?

Also: Please always post the sample table with such questions and explain how the samples differ.

ADD REPLYlink written 9 months ago by Simon Anders3.5k

I'm not surprised that species explains most of the variance. However, I was wondering if there are underlying problems with using DESeq2 for this type of cross-species analysis. The system involves two closely related species with females that perform specialized behaviors, and "behavior4" in "sp2" might be considered a modification of "behavior2" in "sp1". I'm using this simple formula
dds <- DESeqDataSetFromTximport(txi_kallisto, colData = coldata, design = ~ condition)
and then evaluating three contrasts between "sp2" and "sp1" to test predictions of gene expression associated with certain behaviors. I have attached the data table below. Thank you!

| sample        | condition |
|---------------|-----------|
| sp1_lvl1_rep1 | behavior1 |
| sp1_lvl1_rep2 | behavior1 |
| sp1_lvl1_rep3 | behavior1 |
| sp1_lvl1_rep4 | behavior1 |
| sp1_lvl1_rep5 | behavior1 |
| sp1_lvl1_rep6 | behavior1 |
| sp1_lvl2_rep1 | behavior2 |
| sp1_lvl2_rep2 | behavior2 |
| sp1_lvl2_rep3 | behavior2 |
| sp1_lvl2_rep4 | behavior2 |
| sp1_lvl3_rep1 | behavior3 |
| sp1_lvl3_rep2 | behavior3 |
| sp1_lvl3_rep3 | behavior3 |
| sp1_lvl3_rep4 | behavior3 |
| sp1_lvl3_rep5 | behavior3 |
| sp1_lvl3_rep6 | behavior3 |
| sp2_lvl4_rep1 | behavior4 |
| sp2_lvl4_rep2 | behavior4 |
| sp2_lvl4_rep3 | behavior4 |
| sp2_lvl4_rep4 | behavior4 |
| sp2_lvl4_rep5 | behavior4 |
| sp2_lvl4_rep6 | behavior4 |
| sp2_lvl4_rep7 | behavior4 |
| sp2_lvl4_rep8 | behavior4 |

ADD REPLYlink modified 9 months ago • written 9 months ago by fl0
Answer: RNA-seq data analysis with two species using DESeq2
0
gravatar for Simon Anders
9 months ago by
Simon Anders3.5k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.5k wrote:

If you find significant differences between the samples with behaviour 4 and behaviour 1, you cannot say whether this difference is due to the fact that the samples are from animals exhibiting different behavior or due to the fact that they are different species. Your observation that PCA shows most variance to be due to species makes the latter the much more likely explanation.

So, yes, there are major problems with your approach. 

ADD COMMENTlink written 9 months ago by Simon Anders3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 308 users visited in the last hour