I am trying to use Tximport to eventually use edgeR TMM normalization method. I am trying to get a normalization factor per species. For each species I have multiple tissue samples (Ie lung and kidney) and need to normalize across tissues. I have gene to orthogroup maps for each tissue and abundance files from Kallisto for each tissue. I am treating each tissue like a sample. Because I was aware of the fact that tximport needs to have the abundance files and tx2gene maps in the same order, I combined the abundance files for the tissues. For example, for the lung abundance file, it has the transcripts listed for the kidney but with 0 for all of the abundance information. Then I sorted and filled both abundance files and the tx2gene map so they have the same transcripts in the same order for all. I am still getting the error.
I followed the steps from:
Thank you so much for your time!!
> library(readr)
> library(tximport)
> library(tximportData)
> tools:::.BioC_version_associated_with_R_version()
[1] ‘3.11’
> dir <- "/gpfs/scratch/nsipperly/RAPID/kallisto_1Nov2021/ZEROS/Mema"
> samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
> samples
sample run
1 Mema_kidney 1
2 Mema_lung 1
> files <- file.path(dir, samples$sample, "abundance.tsv")
> names(files) <- paste0("sample", 1:2)
> all(file.exists(files))
[1] TRUE
> files
sample1
"/gpfs/scratch/nsipperly/RAPID/kallisto_1Nov2021/ZEROS/Mema/Mema_kidney/abundance.tsv"
sample2
"/gpfs/scratch/nsipperly/RAPID/kallisto_1Nov2021/ZEROS/Mema/Mema_lung/abundance.tsv"
>
> tx2gene <- read_tsv("/gpfs/scratch/nsipperly/RAPID/kallisto_1Nov2021/ZEROS/Mema/MemaAlltissue2_tx2gene.sort.whole.tsv")
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
cols(
TranscriptID = col_character(),
Orthogroup = col_character()
)
> txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 Error in tximport(files, type = "kallisto", tx2gene = tx2gene) :
all(txId == raw[[txIdCol]]) is not TRUE
> traceback()
3: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
2: stopifnot(all(txId == raw[[txIdCol]]))
1: tximport(files, type = "kallisto", tx2gene = tx2gene)
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /gpfs/software/R-4.0.2/lib64/R/lib/libRblas.so
LAPACK: /gpfs/software/R-4.0.2/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tximportData_1.16.0 tximport_1.16.1 readr_1.4.0
loaded via a namespace (and not attached):
[1] crayon_1.4.1 R6_2.5.1 lifecycle_0.2.0 magrittr_2.0.1
[5] pillar_1.4.7 rlang_0.4.11 cli_3.0.1 rstudioapi_0.13
[9] vctrs_0.3.5 ellipsis_0.3.1 glue_1.4.2 hms_0.5.3
[13] compiler_4.0.2 pkgconfig_2.0.3 tibble_3.0.4
#I made some dummy files for the post -- as instructed by the posting guide :)
#Mema_kidney/abundance.tsv
target_id length eff_length est_counts tpm
Mema_kidney_Transcript_1003.p1 2865 2589.74 409 4.35504
Mema_kidney_Transcript_100606.p1 255 39.3322 117433 82331.8
Mema_kidney_Transcript_1006.p2 597 322.318 272.241 23.2913
Mema_kidney_Transcript_100754.p1 258 40.1783 16 10.9813
Mema_kidney_Transcript_1007.p1 3048 2772.74 5229 52.0037
Mema_kidney_Transcript_1011.p2 528 253.775 1012 109.966
Mema_kidney_Transcript_1012.p1 1941 1665.74 919 15.2136
Mema_kidney_Transcript_1015.p1 2769 2493.74 889 9.83049
Mema_kidney_Transcript_1019.p1 534 259.732 1420 150.761
Mema_lung_Transcript_979.p1 0 0 0 0
Mema_lung_Transcript_98263.p1 0 0 0 0
Mema_lung_Transcript_9828.p2 0 0 0 0
Mema_lung_Transcript_983.p1 0 0 0 0
Mema_lung_Transcript_985.p2 0 0 0 0
Mema_lung_Transcript_991.p1 0 0 0 0
Mema_lung_Transcript_9938.p1 0 0 0 0
Mema_lung_Transcript_9959.p2 0 0 0 0
Mema_lung_Transcript_995.p1 0 0 0 0
Mema_lung_Transcript_996.p1 0 0 0 0
#Mema_lung/abundance.tsv
target_id length eff_length est_counts tpm
Mema_kidney_Transcript_1003.p1 0 0 0 0
Mema_kidney_Transcript_100606.p1 0 0 0 0
Mema_kidney_Transcript_1006.p2 0 0 0 0
Mema_kidney_Transcript_100754.p1 0 0 0 0
Mema_kidney_Transcript_1007.p1 0 0 0 0
Mema_kidney_Transcript_1011.p2 0 0 0 0
Mema_kidney_Transcript_1012.p1 0 0 0 0
Mema_kidney_Transcript_1015.p1 0 0 0 0
Mema_kidney_Transcript_1019.p1 0 0 0 0
Mema_lung_Transcript_979.p1 1995 1764.51 819 29.9187
Mema_lung_Transcript_98263.p1 264 84.4105 55 42
Mema_lung_Transcript_9828.p2 609 380.972 1161.26 196.48
Mema_lung_Transcript_983.p1 2904 2673.51 3513 84.6992
Mema_lung_Transcript_985.p2 366 160.345 27 10.8541
Mema_lung_Transcript_991.p1 828 597.667 1179 127.156
Mema_lung_Transcript_9938.p1 414 200.22 149 47.9691
Mema_lung_Transcript_9959.p2 288 100.964 16 10.215
Mema_lung_Transcript_995.p1 1455 1224.51 630 33.1636
Mema_lung_Transcript_996.p1 1395 1164.51 442 24.4659
#/gpfs/scratch/nsipperly/RAPID/kallisto_1Nov2021/ZEROS/Mema/MemaAlltissue2_tx2gene.sort.whole.tsv
TranscriptID Orthogroup
Mema_kidney_Transcript_1003.p1 OG0010091
Mema_kidney_Transcript_100606.p1 OG0014354
Mema_kidney_Transcript_1006.p2 OG0002057
Mema_kidney_Transcript_100754.p1 OG0027137
Mema_kidney_Transcript_1007.p1 OG0009085
Mema_kidney_Transcript_1011.p2 OG0004785
Mema_kidney_Transcript_1012.p1 OG0007052
Mema_kidney_Transcript_1015.p1 OG0002164
Mema_kidney_Transcript_1019.p1 OG0003830
Mema_lung_Transcript_979.p1 OG0002798
Mema_lung_Transcript_98263.p1 OG0002166
Mema_lung_Transcript_9828.p2 OG0006178
Mema_lung_Transcript_983.p1 OG0008817
Mema_lung_Transcript_985.p2 OG0010503
Mema_lung_Transcript_991.p1 OG0006243
Mema_lung_Transcript_9938.p1 OG0013741
Mema_lung_Transcript_9959.p2 OG0001898
Mema_lung_Transcript_995.p1 OG0008495
Mema_lung_Transcript_996.p1 OG0010962
OK I see I will see what I can make work with that information! Thank you!
Realized I never updated -- yes that was the issue :)
Hello Nicolette,
Can you please elaborate how you fix this issue. I am also getting similar issue.
@mbansal maybe you can post a new issue including all of the code you are using. you can include code in triple backtick e.g.: