Question: Isoforms detection using bioconductor
gravatar for
14 months ago by
seungjoon.kim0 wrote:

Dear all,

hi we have RNA-seq data with long reads and try to find transcript isoforms(AS) for every genes. I have now aligned BAM file and want to detect isoforms using IRanges, GenomicRanges and biomart. I have no idea how to detect and quantify the number of isoforms for each gene. Could you give me any advice about work flow or tips how to analyze my data?


Thank you in advance!!


ADD COMMENTlink modified 14 months ago by Michael Lawrence10k • written 14 months ago by seungjoon.kim0
gravatar for Michael Lawrence
14 months ago by
United States
Michael Lawrence10k wrote:

It's a complex topic but here are some tips:

One option is to quantify junctions and use those to make inferences about transcript structure and quantity. The summarizeJunctions() function in GenomicAlignments might help with this. There are also specialized packages like SGSeq.

External tools like Salmon and Kallisto use deconvolution approaches to quantifying isoforms. The tximport package loads the output of those tools into R/Bioconductor.

ADD COMMENTlink written 14 months ago by Michael Lawrence10k

Thanks for the advice. So I tried to use SGSeq to analyze data and some questions and errors came out. I tried to follow the method in splice-variant-identification written by Leonard D Goldstein and how do you retrieve GRanges object of FBXO31 gene. Because it is possible to retrieve a certain chromosome but not the specific gene. 

So I manually create GRanges object and proceed the protocol and in the last step using analyzeFeathres, it caused errors

> sgfc_ucsc <- analyzeFeatures(si, features = txf_ucsc)
Error: sample_info is missing column(s) paired_end, read_length, frag_length, lib_size

Could you figure out how to figure out this error?


ADD REPLYlink written 14 months ago by seungjoon.kim0


analyzeFeatures needs some basic information on your data (e.g. whether reads are paired-end, the total number of aligned reads in the BAM file etc.) All required info can be obtained with

> si <- getBamInfo(si) 

Providing a gene locus is optional, you can do your analysis genome-wide, but this may be computationally intensive, so I would test it on one gene first. Please check the package vignette or help page for more information. 


ADD REPLYlink written 14 months ago by Leonard Goldstein70

Thanks. I will try it.

ADD REPLYlink written 14 months ago by seungjoon.kim0

In the beginning, I found this kind of error.

> bamPath <- "/data/murphy/home/XXX/XXX/sorted_ccs.bam"

> bamFile <- BamFile(bamPath)

> getBamInfo(bamFile)
Error: sample_info must be a data frame with columns sample_name, file_bam

In this case, how can I add sample_name?

Also, I used the same file previously and why other sample_info is missing? (e.g. paired_end, read_length, frag_length, lib_size)

Thanks in advance!!

ADD REPLYlink written 14 months ago by seungjoon.kim0

The input for getBamInfo is a data.frame with columns sample_name and file_bam (see example below). Please read the vignette or help pages.

> si <- data.frame(sample_name = c("a", "b"), file_bam = c("file_a.bam", "file_b.bam"), stringsAsFactors = FALSE)
> si <- getBamInfo(si)
ADD REPLYlink written 14 months ago by Leonard Goldstein70

Thanks for the comment. it works well with analyzing isoforms

but still has problem when plotting the results.

When I followed this script:

par(mfrow = c(2,1), mar = c(2,2,1,2), pty = "m")
plotSpliceGraph(rowRanges(sgfc_pred), geneID = 1, toscale = "none", color_novel = "red")
for (j in 1:4){
  plotCoverage(sgfc_pred[, j], geneID = 1, toscale = "none")

It gives this messages: 

Error: subscript contains out-of-bounds indices

And the the plot goes out of bound so I cannot see the whole plot at all.

Could you help me to figure out of this problem?



ADD REPLYlink modified 14 months ago • written 14 months ago by seungjoon.kim0

Based on the error message it looks like you are working with less than 4 samples, but you try to make plots for columns 1-4 of the SGFeatureCounts object. Regarding out-of-bound plots, the vignette includes several examples on how to specify what genomic regions to include in the plots. Please have a look at the examples in the vignette. 


ADD REPLYlink written 14 months ago by Leonard Goldstein70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 163 users visited in the last hour