Problems with oarfish + tximport + swish
1
0
Entering edit mode
@brownannaleigh-22443
Last seen 5 months ago
United Kingdom

I've aligned long read data with oarfish and now I wanted to try out doing differential isoform usage with fishpond + swish

I've aligned to the gencode v42 human fasta file, which has the following header conformation

 head /SAN/vyplab/vyplab_reference_genomes/sequence/human/gencode/gencode.v42.transcripts.fa
>ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA|
GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC
TCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGA
TGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTG

I've got a dataframe of the file paths of the data and the names.

This is fine

k = tximeta(coldata = data.frame(files = m, names = names(m)), type = 'oarfish')
y <- scaleInfReps(k) # scales counts
y <- labelKeep(y) # labels features to keep
set.seed(120)
y <- swish(y, x="condition") # simplest Swish case

The errors start appearing here:

# run on the transcript-level dataset
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE

Ok that's fine, perhaps tximeta wasn't able to properly infer the right species/version, but I should be able to manually make that

tx_2_gene <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    separate(gene,sep = "\\|",into = c("transcript","geneCol",NA,NA,"transcript_id"
                                       ,"symbol",'tx_leng','biotype'),
             remove = FALSE) %>% 
    select(gene,geneCol)

And let's see if that works

updated_m <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    left_join(tx_2_gene) %>% 
    column_to_rownames('gene') 

mcols(y) <- DataFrame(updated_m)
# run on the transcript-level dataset
iso <- isoformProportions(y)
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE
> "geneCol" %in% names(mcols(y)) 
[1] TRUE

Let's try the summarizeToGene gene function

gse <- summarizeToGene(k,tx2gene = tx_2_gene)
Error in missingMetadata(object, summarize = TRUE) : 
  use of this function requires transcriptome metadata which is missing.
  either: (1) the object was not produced by tximeta, or
  (2) tximeta could not recognize the digest of the transcriptome.
  If (2), use a linkedTxome to provide the missing metadata and rerun tximeta
  or use tx2gene, txOut=FALSE (and skipMeta=TRUE if Salmon/piscem)

Okay following the linkedTxome path

fastaFTP <- c("https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.transcripts.fa.gz")
> gtfPath <- file.path("/Users/annaleigh/Downloads/gencode.v42.annotation.gtf.gz")
> makeLinkedTxome(indexDir='/Users/annaleigh/cluster/vyplab/first_weeks/recalled_nanopore/oarfish/', 
+                 source="localgencode", 
+                 organism="Homo sapiens",
+                 release="42", 
+                 genome="GRCH38", 
+                 fasta=fastaFTP, 
+                 gtf=gtfPath, 
+                 write=FALSE,
+                 jsonFile = "provaLinkedTxome.json")
Error: lexical error: invalid char in json text.
                                      /Users/annaleigh/cluster/vyplab/
                     (right here) ------^

So - what am I doing wrong here?

Sessioninfo

> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Ventura 13.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.20.0         fishpond_2.12.0             lubridate_1.9.4             forcats_1.0.0               purrr_1.0.2                
 [6] tidyr_1.3.1                 tibble_3.2.1                tidyverse_2.0.0             tximeta_1.24.0              IsoformSwitchAnalyzeR_2.6.0
[11] pfamAnalyzeR_1.6.0          dplyr_1.1.4                 stringr_1.5.1               readr_2.1.5                 sva_3.54.0                 
[16] genefilter_1.88.0           mgcv_1.9-1                  nlme_3.1-166                satuRn_1.14.0               DEXSeq_1.52.0              
[21] RColorBrewer_1.1-3          AnnotationDbi_1.68.0        DESeq2_1.46.0               SummarizedExperiment_1.36.0 GenomicRanges_1.58.0       
[26] GenomeInfoDb_1.42.1         IRanges_2.40.1              S4Vectors_0.44.0            MatrixGenerics_1.18.1       matrixStats_1.5.0          
[31] Biobase_2.66.0              BiocGenerics_0.52.0         BiocParallel_1.40.0         limma_3.62.2                ggplot2_3.5.1              
[36] data.table_1.16.4
fishpond tximport tximeta • 1.4k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 17 hours ago
United States

This is fine

Did tximeta() print that a matching transcriptome was found?

ADD COMMENT
0
Entering edit mode

I'm not certain that this will work with Oarfish, the documentation only mentions Salmon and piscem.

I'd use tximport for now.

ADD REPLY
0
Entering edit mode

Success

k = tximport(file = m,type = 'oarfish',
         txOut = TRUE,
         tx2gene = tx2gene)

Failure

    y <- scaleInfReps(k) # scales counts
Error: unable to find an inherited method for function 'metadata' for signature 'x = "list"'




head(tx2gene)
                                                                                                                                        txid           geneCol
                                                                                                                                      <char>            <char>
1:                                               ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA| ENSG00000290825.1
2: ENST00000450305.2|ENSG00000223972.6|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene| ENSG00000223972.6
3:              ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene| ENSG00000227232.5
4:                                                                 ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA| ENSG00000278267.1
5:                     ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|MIR1302-2HG-202|MIR1302-2HG|712|lncRNA| ENSG00000243485.5
6:                     ENST00000469289.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002841.2|MIR1302-2HG-201|MIR1302-2HG|535|lncRNA| ENSG00000243485.5

Thank you for any advice

ADD REPLY
0
Entering edit mode

Lol I'm not smart

This solution didn't work because....the column name was wrong

updated_m <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('txid') %>% 
    left_join(tx2gene) %>% 
    column_to_rownames('txid') 
mcols(y) <- DataFrame(updated_m)
iso <- isoformProportions(y,geneCol = 'geneCol')

however, tximport does not produce a format that the fishpond data can read out of the gate

ADD REPLY
0
Entering edit mode

That's true. We haven't yet developed the Oarfish to tximeta pipeline yet.

Can you say what functionality you need? Do you want to do testing on isoform proportions (DTU)?

You could use tximeta with skipMeta=TRUE which would produce the correctly shaped SummarizedExperiment object, and then manually add the gene ID.

ADD REPLY
0
Entering edit mode

Trying to do differential gene expression(for other reasons) and also DTU in the same bucket. I will try the skipMeta=TRUE and update with code for fellow lost Googlers

ADD REPLY
0
Entering edit mode

Thanks Annaleigh! I'll make sure to also reply here once we've worked out Oarfish to tximeta with metadata, or any other analysis pipeline updates wrt Oarfish and Bioc.

ADD REPLY
0
Entering edit mode

More failures encountered

k_skip = tximeta(coldata = data.frame(files = m, names = names(m)), 
                 type = 'oarfish',skipMeta = TRUE)
    > updated_m <- mcols(k_skip) %>% 
    +     as.data.frame() %>% 
    +     rownames_to_column('txid') %>% 
    +     left_join(tx2gene) %>% 
    +     column_to_rownames('txid') 
    Joining with `by = join_by(txid)`
    > mcols(k_skip) <- DataFrame(updated_m)
    > gse <- tximport::summarizeToGene(k_skip,tx2gene = tx2gene)
    Error in missingMetadata(object, summarize = TRUE) : 
      use of this function requires transcriptome metadata which is missing.
      either: (1) the object was not produced by tximeta, or
      (2) tximeta could not recognize the digest of the transcriptome.
      If (2), use a linkedTxome to provide the missing metadata and rerun tximeta
      or use tx2gene, txOut=FALSE (and skipMeta=TRUE if Salmon/piscem)
ADD REPLY
0
Entering edit mode

Let me see if I can add some code to tximeta to cover this case.

I've added an argument skipRanges in devel branch which should allow:

gse <- summarizeToGene(se, skipRanges=TRUE, tx2gene=tx2gene)

when skipMeta was used. You could then populate your own rowData for genes on the resulting gse.

Can you test this new code with: devtools::install_github("thelovelab/tximeta")

ADD REPLY
0
Entering edit mode

Awesome possum! Update works!!

Thank you so much for the help and guidance!

ADD REPLY
0
Entering edit mode

Great thanks for posting the issue and sticking with it Annaleigh

ADD REPLY

Login before adding your answer.

Traffic: 1048 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6