Question: Handling diploid vs haploid reference genomes in salmon > DESeq2 analysis
0
gravatar for andrewpountain
17 months ago by
andrewpountain0 wrote:

Hi there,

I'm using DESeq2 for analysis on salmon-derived counts. It's a bit of a niche question, but advice would be appreciated.

I am trying to compare expression of orthologous genes across multiple closely related species (obviously taking care over the various pitfalls that entails). One issue I have, however, is that one of these species has a reference genome that has been assembled into diploid chromosomes, whereas the others have not. In other words, two separate ORFs are provided for each gene, where there is only one for the other species. I wondered how best to treat the data in order to allow direct comparison.

So far I have either:

1) edited the reference trancriptome to give me only one type of ORF (they are all denoted "_A" or "_B")

or

2) aligned to the full transcriptome and then summed the counts across the two ORF versions

Whilst the counts are pretty similar for the most part, option 1 seems to underestimate counts for a small subset of genes. On the other hand, I feel like option 2 is more likely to flout the expectations of subsequent analysis e.g. how features with very low read counts are handled. After this step I am comparing by using shared gene IDs. If anyone had any advice on which of these two options (or any other approaches) might be better, that would be very helpful.

Kind regards,

Andrew

P.S. it's worth noting that since the other species' transcriptomes are not resolved into two haplotypes, one can assume this scenario is actually pretty similar to my edited transcriptome with only one ORF sequence per gene.

P.P.S. Alternative splicing isn't a consideration for this organism.

deseq2 salmon • 274 views
ADD COMMENTlink modified 17 months ago by Martin Morgan ♦♦ 23k • written 17 months ago by andrewpountain0
Answer: Handling diploid vs haploid reference genomes in salmon > DESeq2 analysis
0
gravatar for Ryan C. Thompson
17 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.4k wrote:

You could probably treat the two haplotypes of each gene as if they were alternative splice forms of a single gene, and Salmon should give you a sane answer for the gene counts and effective lengths.

ADD COMMENTlink written 17 months ago by Ryan C. Thompson7.4k

Rob has tested this and it works reasonably well.

I think I'd go with (2), just to have higher counts. You may have lower counts for genes in the other species if there were so many SNPs that it trips up the read mapping, but do you have a rough idea how many SNPs are in the genes?

ADD REPLYlink written 17 months ago by Michael Love25k

Hi Mike,

Thanks for getting back to me. I haven't looked at SNP frequency, it's my first time working with these genomes. I think there's a reasonable amount of divergence between haplotypes but haven't quantified. Something to look into. My feeling is that the genes that are substantially underestimated in (1) vs (2) number in the dozens out of several thousand, just by eyeballing a plot of the two. As for the reference genomes for the other species, I have no idea about the degree of heterozygosity.

Where possible I am trying to avoid directly comparing counts for individual between species though, and focusing on condition effects and species/condition interactions - hopefully this at least partially avoids issues arising from differing mapping efficiences/biases.

Thanks,

Andrew

ADD REPLYlink written 17 months ago by andrewpountain0

Hi Ryan,

Thanks for your response. Yes that's definitely worth a try to compare. I'm assuming the tx2gene functionality in tximport just sums alternative transcripts for the final gene count, I'll check up on this.

Thanks,

Andrew

ADD REPLYlink written 17 months ago by andrewpountain0

Yes it does.

ADD REPLYlink written 17 months ago by Michael Love25k

Thanks for clarifying!

Andrew

ADD REPLYlink written 17 months ago by andrewpountain0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 266 users visited in the last hour