Question

Specifying tx2gene with tximeta

0

Entering edit mode

aimee.hanson • 0

@aimeehanson-23841

Last seen 5.6 years ago

Hi!

I'm working with RNASeq data from Salmon (Hg38 index) and have had success with reading my quant.sf files into R using the recommended:

## Read quant.sf files into summarised experiment (se) using tximeta
se <- tximeta(coldata)
gse <- summarizeToGene(se)

Subsequently, I've been trying to determine what to do with genes that have multiple Ensembl gene IDs (due to being derived from alternate haplotypes/patches in the Hg38 reference) prior to DESeq2 analysis. My attempted work around for this has been to generate a tx2gene table so that counts derived from all transcripts of a gene, including those transcripts on alternate haplotypes which are otherwise mapped to differing gene IDs, are instead summarised under the Ensembl gene ID corresponding to the gene on the reference chromosome.

After generating a tx2gene reference (tx2gene.switch) I've had success doing this: gse.noalt <- tximport::tximport(coldata$files, type = "salmon", tx2gene = tx2gene.switch, ignoreTxVersion = TRUE) however, would like to still import the very useful meta data that is accessible when using tximeta.

Simply running this fails:

> gse.noalt <- summarizeToGene(se, tx2gene = tx2gene.switch)
loading existing EnsDb created: 2020-06-09 08:29:00
obtaining transcript-to-gene mapping from database
loading existing gene ranges created: 2020-06-10 00:50:52
Error in .local(object, ...) : 
  formal argument "tx2gene" matched by multiple actual arguments

Presumably due to the default tx2gene information being sourced from the Salmon Index. Is there way for me to obtain data in the format achieved by using tximeta --> summarizeToGene with my own transcript to gene mapping? More generally, is it wise to collapse transcript counts derived from genes on alternate haplotypes in the same way counts from alternate isoforms are for gene level analyses?

Thanks in advance for any help!!

Aimee

tximeta tximport • 1.7k views

ADD COMMENT • link updated 5.6 years ago by Michael Love 43k • written 5.6 years ago by aimee.hanson • 0

score 1 · Answer 1 · 2020-07-10

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 hours ago

United States

If you are working with human data, I'd just recommend using GENCODE, which is then a standard annotation set and does not have the issues with duplicate genes on haplotype chromosomes. I switched over to primarily using GENCODE a few years ago for human data.

ADD COMMENT • link 5.6 years ago Michael Love 43k