Question

Error when predicting features with SGSeq

1

Entering edit mode

Helen Zhou ▴ 150

@helen-zhou-2654

Last seen 9.6 years ago

United States

I'm trying to use the analyzeFeatures function in SGSeq following the vignette, but I get this error:

> library(SGSeq)

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)

> sgfc <- analyzeFeaturesbam.info, which = gr)

Predict features...

Error in mergeTxFeatures(list_features, min_n_sample = min_n_sample) :

... must be one or more TxFeatures or a single

list of TxFeatures

The bam.info object contains information about my BAM files, after running getBamInfo():

> bam.info

sample_name file_bam paired_end read_length frag_length lib_size

1 sample1 /home/projects/gene_expression/Tmp/A549.cytosol.polyA.bam TRUE 101 262 126299548

2 sample2 /home/projects/gene_expression/Tmp/A549.nucleus.polyA.bam TRUE 101 293 149162537

6 sample3 /home/projects/gene_expression/Tmp/GM12878.cytosol.polyA.bam TRUE 76 276 62103999

7 sample4 /home/projects/gene_expression/Tmp/GM12878.nucleus.polyA.bam TRUE 76 239 60730206

These files have been merged from two separate bam files, processed with RSEM and TopHat. It includes transcripts from UCSC (hg19), but also some from other sources.

When I did into the functions a bit, the problem seems to occur with predictTxFeaturesPerSample. If I run that directly on a single sample, I get:

> sample_info <- bam.info

> i <- 1

> SGSeq:::predictTxFeaturesPerSample(file_bam = sample_info$file_bam[i],

+ paired_end = sample_info$paired_end[i], read_length = sample_info$read_length[i],

+ frag_length = sample_info$frag_length[i], lib_size = sample_info$lib_size[i], which=gr, alpha = 2, psi = 0, beta = 0.2,

+ gamma = 0.2)

Error in (function (classes, fdef, mtable) :

unable to find an inherited method for function ‘mcols’ for signature ‘"character"’

Does SGSeq simply not work for samples that have been mapped using annotation that only partly overlaps with the USCS data?

~~Helen

SGSeq • 2.2k views

ADD COMMENT • link updated 9.6 years ago by Leonard Goldstein ▴ 90 • written 9.6 years ago by Helen Zhou ▴ 150

score 0 · Answer 1 · 2015-07-24

Hi Helen,

Thanks for your question. Can I ask what version of SGSeq you are using? The output of sessionInfo() would be helpful.

BAM files used as input for SGSeq should be RNA-seq data mapped to a reference genome with a splice-aware aligner that includes the custom tag XS in the BAM output (e.g. GSNAP, STAR, TopHat). If you have multiple BAM files that are intended to be analyzed together, they should all include mappings to the same reference genome. It sounds like your BAM files might contain alignments against a transcriptome rather than a genome?

Leonard

score 0 · Answer 2 · 2015-07-24

My apologize Leonard. Session info is shown below, along with the error.

The bam files do contain multiple alignments. I asked our bioinformatics facility to follow the approach outlined in this G&D article, since we are also looking at neuronal stem cells. http://genesdev.cshlp.org/content/27/9/1032.long It is my understanding (but I am not sure) that the reads were mapped both to the genome and the transcriptome (human).

> sgfc <- analyzeFeaturesbam.info, which = NULL)

Predict features...

Error in mergeTxFeatures(list_features, min_n_sample = min_n_sample) :

... must be one or more TxFeatures or a single

list of TxFeatures

In addition: Warning message:

In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule, :

20 function calls resulted in an error

> sessionInfo()

R version 3.2.1 (2015-06-18)

Platform: x86_64-unknown-linux-gnu (64-bit)

Running under: Ubuntu 14.04.2 LTS

locale:

[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C

[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8

[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8

[7] LC_PAPER=en_US.UTF-8 LC_NAME=C

[9] LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:

[1] stats4 parallel stats graphics grDevices utils datasets

[8] methods base

other attached packages:

[1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.2

[2] GenomicFeatures_1.20.1

[3] AnnotationDbi_1.30.1

[4] Biobase_2.28.0

[5] SGSeq_1.2.2

[6] GenomicRanges_1.20.5

[7] GenomeInfoDb_1.4.1

[8] IRanges_2.2.5

[9] S4Vectors_0.6.2

[10] BiocGenerics_0.14.0

loaded via a namespace (and not attached):

[1] igraph_1.0.1 XVector_0.8.0 magrittr_1.5

[4] zlibbioc_1.14.0 GenomicAlignments_1.4.1 BiocParallel_1.2.9

[7] tools_3.2.1 DBI_0.3.1 lambda.r_1.1.7

[10] futile.logger_1.4.1 rtracklayer_1.28.6 futile.options_1.0.0

[13] bitops_1.0-6 RCurl_1.95-4.7 biomaRt_2.24.0

[16] RSQLite_1.0.0 Biostrings_2.36.1 Rsamtools_1.20.4

[19] XML_3.98-1.3

score 0 · Answer 3 · 2015-07-24

0

Entering edit mode

Leonard Goldstein ▴ 90

@leonard-goldstein-5925

Last seen 9.6 years ago

United States

Hi Helen,

Thanks for providing the sessionInfo output.

It sounds like these are non-standard BAM files (merged from different sources). Can you try running SGSeq on the unmodified BAM files that were generated by TopHat?

Leonard

ADD COMMENT • link 9.6 years ago Leonard Goldstein ▴ 90

score 0 · Answer 4 · 2015-07-24

0

Entering edit mode

Helen Zhou ▴ 150

@helen-zhou-2654

Last seen 9.6 years ago

United States

Leonard,

unfortunately we have to remove temporary files due to size constraints on our folders. I will try running SGSeq next time I have some of the intermediate result file from our bioinformatics core.

~~Helen

ADD COMMENT • link 9.6 years ago Helen Zhou ▴ 150

0

Entering edit mode

Peharps you could try extracting the reads in the BAM file you have and then realign to a reference genome. Googling for "realign reads bam file" will turn up several hits around that theme that you might be able to try.

ADD REPLY • link 9.6 years ago Steve Lianoglou ★ 13k

score 0 · Answer 5 · 2015-07-24

Hi Helen,

If it is an option to make available the current BAM files, I can have a look at the data to understand what is causing the problem.

Otherwise I suggest running SGSeq on the unmodified TopHat output when you get the chance and if you are still encountering problems, please do report them on the support site.

Many thanks,

Leonard