What type of counts data to import for performing Isoform analysis in edgeR
1
0
Entering edit mode
@mohammedtoufiq91-17679
Last seen 39 minutes ago
United States

Hi,

I am interested in performing isoform analysis on short read data (150bp) using the edgeR package, following the example from the edgeR User's Guide, section "4.6 Differential transcript expression of human lung adenocarcinoma cell lines."

I ran my pipeline using the nf-core/rnasplice pipeline and obtained counts and TPM values: nf-core rnasplice output:

From the example 4.6 in the edgeR User's Guide, I tried importing the "quant.sf" files but experienced difficulties. Based on the example, I imported scaled counts as suggested:

# Define the path to your TSV file
file_path <- "/projects/salmon/tximport/salmon.merged.transcript_counts_scaled.tsv"

# Import the TSV file
scaled.counts <- read.delim(file_path, header = TRUE, sep = "\t", row.names = 1)

# Create DGEList object
y <- DGEList(counts = scaled.counts, samples = Samples_metadata)
dim(y)

Which of the following files makes the most sense to import for "4.6 Differential transcript expression of human lung adenocarcinoma cell lines"?

Counts from nf-core/rnasplice:

  • salmon.merged.transcript_counts.tsv: Matrix of isoform-level raw counts across all samples.
  • salmon.merged.transcript_counts_scaled.tsv: Matrix of isoform-level scaled raw counts across all samples.
  • salmon.merged.transcript_counts_length_scaled.tsv: Matrix of isoform-level length-scaled raw counts across all samples.
  • salmon.merged.transcript_counts_dtu_scaled.tsv: Matrix of isoform-level dtu scaled raw counts across all samples.

TPMs from nf-core/rnasplice:

  • salmon.merged.transcript_tpm.tsv: Matrix of isoform-level TPM values across all samples.
  • salmon.merged.transcript_tpm_scaled.tsv: Matrix of isoform-level scaled TPM values across all samples.
  • salmon.merged.transcript_tpm_length_scaled.tsv: Matrix of isoform-level length-scaled TPM values across all samples.
  • salmon.merged.transcript_tpm_dtu_scaled.tsv: Matrix of isoform-level dtu scaled TPM values across all samples.

limma isoform edgeR salmon R • 86 views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.5k
@atpoint-13662
Last seen 14 minutes ago
Germany

You don't need any of these files. You need the catchSalmon function which reads quant.sf and the inferential (Bootstrap/Gibbs) replicates. Where in nf-core you can find that I cannot tell, but this is what the user guide tells, see also ?catchSalmon.

ADD COMMENT
0
Entering edit mode

ATpoint Thank you. I did wanted to try using the quant.sf files from the salmon run by referring edgeR user guide example 4.6 Differential transcript expression of human lung adenocarcinoma cell lines I tried the below steps, but I see there are NA values in the Overdispersion, and after running scaled.counts, the table is populated with NA values (see below screenshots)

> setwd("/projects/RNA/salmon/")

> quant <- dirname(list.files(".", "quant.sf", recursive = TRUE, full.names = TRUE))

> print(quant)

[1] "./DD21_progenitors" "./DD39_progenitors" "./DD48_progenitors" "./DD98_progenitors"

> catch <- catchSalmon(paths = quant)

Reading ./DD21_progenitors, 427366 transcripts, 0 none samples
Reading ./DD39_progenitors, 427366 transcripts, 0 none samples
Reading ./DD48_progenitors, 427366 transcripts, 0 none samples
Reading ./DD98_progenitors, 427366 transcripts, 0 none samples

> scaled.counts <- catch$counts/catch$annotation$Overdispersion

ADD REPLY
0
Entering edit mode

Probably nf-core did not run with bootstrap/gibbs replicates. Check the code. See salmon docs on how to turn that on.

ADD REPLY

Login before adding your answer.

Traffic: 720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6