Dear all,
hi we have RNA-seq data with long reads and try to find transcript isoforms(AS) for every genes. I have now aligned BAM file and want to detect isoforms using IRanges, GenomicRanges and biomart. I have no idea how to detect and quantify the number of isoforms for each gene. Could you give me any advice about work flow or tips how to analyze my data?
Thank you in advance!!
Thanks for the advice. So I tried to use SGSeq to analyze data and some questions and errors came out. I tried to follow the method in splice-variant-identification written by Leonard D Goldstein and how do you retrieve GRanges object of FBXO31 gene. Because it is possible to retrieve a certain chromosome but not the specific gene.
So I manually create GRanges object and proceed the protocol and in the last step using analyzeFeathres, it caused errors
Could you figure out how to figure out this error?
Hi,
analyzeFeatures needs some basic information on your data (e.g. whether reads are paired-end, the total number of aligned reads in the BAM file etc.) All required info can be obtained with
Providing a gene locus is optional, you can do your analysis genome-wide, but this may be computationally intensive, so I would test it on one gene first. Please check the package vignette or help page for more information.
Leonard
Thanks. I will try it.
In the beginning, I found this kind of error.
> bamPath <- "/data/murphy/home/XXX/XXX/sorted_ccs.bam"
> bamFile <- BamFile(bamPath)
In this case, how can I add sample_name?
Also, I used the same file previously and why other sample_info is missing? (e.g. paired_end, read_length, frag_length, lib_size)
Thanks in advance!!
The input for getBamInfo is a data.frame with columns sample_name and file_bam (see example below). Please read the vignette or help pages.
Thanks for the comment. it works well with analyzing isoforms
but still has problem when plotting the results.
When I followed this script:
par(mfrow = c(2,1), mar = c(2,2,1,2), pty = "m")
plotSpliceGraph(rowRanges(sgfc_pred), geneID = 1, toscale = "none", color_novel = "red")
for (j in 1:4){
plotCoverage(sgfc_pred[, j], geneID = 1, toscale = "none")
}
It gives this messages:
And the the plot goes out of bound so I cannot see the whole plot at all.
Could you help me to figure out of this problem?
Based on the error message it looks like you are working with less than 4 samples, but you try to make plots for columns 1-4 of the SGFeatureCounts object. Regarding out-of-bound plots, the vignette includes several examples on how to specify what genomic regions to include in the plots. Please have a look at the examples in the vignette.