I'm working with RNASeq data from Salmon (Hg38 index) and have had success with reading my quant.sf files into R using the recommended:
## Read quant.sf files into summarised experiment (se) using tximeta se <- tximeta(coldata) gse <- summarizeToGene(se)
Subsequently, I've been trying to determine what to do with genes that have multiple Ensembl gene IDs (due to being derived from alternate haplotypes/patches in the Hg38 reference) prior to
DESeq2 analysis. My attempted work around for this has been to generate a
tx2gene table so that counts derived from all transcripts of a gene, including those transcripts on alternate haplotypes which are otherwise mapped to differing gene IDs, are instead summarised under the Ensembl gene ID corresponding to the gene on the reference chromosome.
After generating a tx2gene reference (
tx2gene.switch) I've had success doing this:
gse.noalt <- tximport::tximport(coldata$files, type = "salmon", tx2gene = tx2gene.switch, ignoreTxVersion = TRUE)
however, would like to still import the very useful meta data that is accessible when using
Simply running this fails:
> gse.noalt <- summarizeToGene(se, tx2gene = tx2gene.switch) loading existing EnsDb created: 2020-06-09 08:29:00 obtaining transcript-to-gene mapping from database loading existing gene ranges created: 2020-06-10 00:50:52 Error in .local(object, ...) : formal argument "tx2gene" matched by multiple actual arguments
Presumably due to the default tx2gene information being sourced from the Salmon Index. Is there way for me to obtain data in the format achieved by using
summarizeToGene with my own transcript to gene mapping?
More generally, is it wise to collapse transcript counts derived from genes on alternate haplotypes in the same way counts from alternate isoforms are for gene level analyses?
Thanks in advance for any help!!