Question

Issues with SGSeq: plotFeatures method returns NULL

1

Entering edit mode

Sylvain Foisy ▴ 80

@sylvain-foisy-5539

Last seen 4.6 years ago

Canada

Hi,

I got over my errors using SGSeq's analyzeFeatures() but I am stumbling on something else (or is it?). Following Leonard's suggestions, I did this:

seqlevels_in_bam <- seqlevels(BamFile(data.r$file_bam[1]))
sgfdb <- keepSeqlevels(sgfdb, seqlevels_in_bam)

But I get this warning message:

#Warning message:
#In keepSeqlevels(sgfdb, seqlevels_in_bam) :
#  invalid seqlevels'chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr14_KI270722v1_random', 'chr14_KI270723v1_random', 'chr14_KI270724v1_random', 'chr14_KI270725v1_random', 'chr15', 'chr16', 'chr17', 'chr17_KI270729v1_random', 'chr17_KI270730v1_random', 'chr18', 'chr19', 'chr1_KI270707v1_random', 'chr1_KI270708v1_random', 'chr1_KI270709v1_random', 'chr1_KI270710v1_random', 'chr2', 'chr20', 'chr21', 'chr22', 'chr22_KI270732v1_random', 'chr22_KI270735v1_random', 'chr22_KI270736v1_random', 'chr22_KI270737v1_random', 'chr22_KI270738v1_random', 'chr22_KI270739v1_random', 'chr2_KI270715v1_random', 'chr2_KI270716v1_random', 'chr3', 'chr4', 'chr5', 'chr5_GL000208v1_random', 'chr6', 'chr7', 'chr8', 'chr9', 'chr9_KI270717v1_random', 'chr9_KI270718v1_random', 'chr9_KI270719v1_random', 'chr9_KI270720v1_random', 'chrEBV', 'chrM', 'chrUn_GL000216v2', 'chrUn_GL000226v1', 'chrUn_KI270302v1', 'chrUn_KI270303v1', 'chrUn_KI270304v1', 'chrUn_KI270305v1', 'chrUn_KI270310v1', 'chrUn_KI270311v1', 'chrUn [... truncated]

Still related to the original issues of non-present chromosomes in my BAM files? After this, I try the following:

sgfc_ncbi<-analyzeFeatures(data.r,features=sgfdb,cores=4)
sgfc_ncbi
# class: SGFeatureCounts 
# dim: 1555 24 
# metadata(0):
# assays(2): counts FPKM
# rownames: NULL
# rowData names(0):
# colnames(24): MONOCYTES_169377 MONOCYTES_201678 ... puit1_984968
#   puit1_999046
# colData names(6): sample_name file_bam ... frag_length lib_size
lmna <- plotFeatures(sgfc_ncbi,geneID=4000)
lmna
# NULL

So the plotFeatures() method seems to fail... Any idea? FYI, I am running the latest R-3.3.1/SGSeq combo.

Best regards

S

sgseq • 3.2k views

ADD COMMENT • link updated 7.6 years ago by Leonard Goldstein ▴ 180 • written 7.7 years ago by Sylvain Foisy ▴ 80

score 0 · Answer 1 · 2016-08-10

0

Entering edit mode

Leonard Goldstein ▴ 180

@leonard-goldstein-6845

Last seen 14 months ago

Australia

Hi Sylvain,

Yes the warning message is related to your earlier problem with chromosome names. keepSeqlevels() attempts to subset your object sgfdb, such that seqlevels from the BAM file header are kept. Since there are seqlevels in the BAM file that cannot be found in sgfdb, the function emits a warning. You can avoid this by subsetting to the intersection of seqlevels in the BAM file and your original sgfdb object using something like:

seqlevels_in_bam <- intersect(seqlevels_in_bam, seqlevels(sgfdb))

Regarding your question about plotFeatures(), it looks like there is no gene in your data that has geneID 4000. Note that geneID is different from geneName. geneIDs are unique identifiers for the connected components of the splice graph. If you want to plot the gene in your annotation that has Entrez ID "4000" you need to call plotFeatures() with argument geneName="4000".

Best wishes

Leonard

ADD COMMENT • link 7.7 years ago Leonard Goldstein ▴ 180

0

Entering edit mode

Hi,

Sorry for the late answer: I was out and offline for a while... OK, here goes: I did this bit of code:

> txdb<-importTranscripts("/path/toward/Annotation/used/for/alignment/genes.gtf")
> txdb<-convertToTxFeatures(txdb)
seqlevelsStyle(txdb) <- "NCBI"
> sgfdb<-convertToSGFeatures(txdb)
# See C: Issues with SGSeq: analyzeFeatures() returns errors on BAM files
> seqlevels_in_bam <- seqlevels(BamFile(data.r$file_bam[1]))
# See A: Issues with SGSeq: plotFeatures method returns NULL
> seqlevels_in_bam <- intersect(seqlevels_in_bam, seqlevels(sgfdb))
> sgfdb <- keepSeqlevels(sgfdb, seqlevels_in_bam

I get no more warnings... I than do this:

> sgfc_ncbi<-analyzeFeatures(data.r,features=sgfdb,cores=8)

> plotFeatures(sgfc_ncbi,geneName="4000")

NULL

> plotFeatures(sgfc_ncbi)

Error in getExonCoordinates(g, toscale) :

features must be on the same chromosome and strand

How can I see what I can use from sgfc_ncbi?

Thanks in advance

S

ADD REPLY • link 7.7 years ago Sylvain Foisy ▴ 80

0

Entering edit mode

Hi Sylvain,

You need to provide a gene name that can be found in your annotation. If you import annotation from GTF format, by default transcript names and gene names are obtained from GTF attribute tags 'transcript_id' and 'gene_id' respectively. If you are unsure you can print the SGFeatures object using e.g.

> sgfdb

or

> rowRanges(sgfc_ncbi)

Column 'geneName' contains the gene names for your annotation, and these can be used with the 'geneName' argument in plotFeatures(). I hope this helps

Leonard

ADD REPLY • link 7.7 years ago Leonard Goldstein ▴ 180

0

Entering edit mode

Hi,

Apologies for the lengthy delay at feedback... School is back up and I had a pretty heavy teaching agenda :-) Ok, I went back on what I got and came back kind of puzzled... The seqlevels_in_bam line above seems to be creating some issues: when I perform the intersect with its outcome, I only get 35 locations, mostly of the type chrz_blablabla_random or chrUn_blablabla. So it seems that I did my analysis only on the ranges that were + not + in my BAM files. If I try to use dropSeqlevels instead of keepSeqlevels, I get this error:

sgfc_ncbi<-analyzeFeatures(data.r,features=sgfdb,cores=4)

Obtain counts...

Error:

Error in getSGFeatureCountsPerSample for MONOCYTES_169377:

Error in .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file), :

seqlevels(param) not in BAM header:

seqlevels: '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y', 'MT', 'chr1_GL383518v1_alt', 'chr1_GL383519v1_alt', 'chr1_GL383520v2_alt', 'chr1_KI270759v1_alt', 'chr1_KI270761v1_alt', 'chr1_KI270762v1_alt', 'chr1_KI270763v1_alt', 'chr1_KI270765v1_alt', 'chr1_KI270766v1_alt', 'chr1_KI270892v1_alt', 'chr2_GL383521v1_alt', 'chr2_GL383522v1_alt', 'chr2_GL582966v2_alt', 'chr2_KI270767v1_alt', 'chr2_KI270768v1_alt', 'chr2_KI270769v1_alt', 'chr2_KI270770v1_alt', 'chr2_KI270772v1_alt', 'chr2_KI270773v1_alt', 'chr2_KI270774v1_alt', 'chr2_KI270776v1_alt', 'chr2_KI270893v1_alt', 'chr2_KI270894v1_alt', 'chr3_GL383526v1_alt', 'chr3_JH636055v2_alt', 'chr3_KI270777v1_alt', 'chr3_KI270779v1_alt', 'chr3_KI270780v1_alt', 'chr3_KI270782v1_a

In addition: Warning message:

In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule, :

24 function calls resulted in an error

I apologize if these questions seem totally newbie, but I always find that some R/Bioconductor operations are rather counterintuitive for biologists... Which is why I am doing it to teach other members in my lab ;-)

Best regards

S

ADD REPLY • link 7.6 years ago Sylvain Foisy ▴ 80

0

Entering edit mode

Hi Sylvain,

The seqlevels (or chromosome names) in your annotation have to match the reference sequence names in your BAM files. Can you post the output for the following commands:

> seqlevels(sgfdb) 
> seqlevels(BamFile(data.r$file_bam[1]))

Thanks,
Leonard

ADD REPLY • link 7.6 years ago Leonard Goldstein ▴ 180

0

Entering edit mode

Hi,

Well, I finally got it sorted out: I did make a mistake in the creation of my txdb object by setting its style to NCBI. This made all my chrZ entries to be turned into Z only, therefore the mistake... Duh! Switching to UCSC fixed the problem. Just to be sure not to repeat these mistakes, I wrote down the recipe in our lab's wiki.

I started two analysis (two different types of stimulations), one turned out ok, the other one through me a new error after quite some time:

Error in `colnames<-`(`*tmp*`, value = c("puit1_169377", "puit1_201678", :

length of 'dimnames' [2] not equal to array extent

Calls: analyzeFeatures ... getSGFeatureCounts -> makeSGFeatureCounts -> colnames<- -> colnames<-

What did I did wrong this time?

Thanks in advance

S

ADD REPLY • link 7.6 years ago Sylvain Foisy ▴ 80

score 0 · Answer 2 · 2016-09-28

0

Entering edit mode

Leonard Goldstein ▴ 180

@leonard-goldstein-6845

Last seen 14 months ago

Australia

Hi Sylvain,

It looks like there was an error in analyzeFeatures() when assembling results. To figure out what went wrong, can you run the individual analysis steps separately. For example if your sample information is stored in object 'si' and splice graph for your gene annotation in object 'sgf' can you run:

> counts <- getSGFeatureCounts(sample_info = si, features = sgf, counts_only = TRUE)
> sgfc <- makeSGFeatureCounts(rowRanges = sgf, colData = si, counts = counts)

Thanks!
Leonard

ADD COMMENT • link 7.6 years ago Leonard Goldstein ▴ 180

0

Entering edit mode

Hi Leonard,

Can do ;-) Will let you know ASAP how it turns out.

Best regards

S

ADD REPLY • link 7.6 years ago Sylvain Foisy ▴ 80

0

Entering edit mode

Hi Leonard,

Ok, the getSGFeaturesCounts+makeSGFeaturesCounts procedure worked while the analyzeFeatures() failed... Any idea why? Anyway moving forward with analyzeVariants() ;-)

Best regards

ADD REPLY • link 7.6 years ago Sylvain Foisy ▴ 80

score 0 · Answer 3 · 2016-10-02

0

Entering edit mode

Leonard Goldstein ▴ 180

@leonard-goldstein-6845

Last seen 14 months ago

Australia

Hi Sylvain,

Glad to hear it worked. analyzeFeatures() is just a wrapper function (see section 12 of the vignette) – so I'm not sure what caused the analyzeFeatures error. If you run into problems again, please let me know.

Leonard

ADD COMMENT • link 7.6 years ago Leonard Goldstein ▴ 180

score 0 · Answer 4 · 2016-10-05

0

Entering edit mode

Sylvain Foisy ▴ 80

@sylvain-foisy-5539

Last seen 4.6 years ago

Canada

Hi Leonard,

Exploring the plotFeatures method ;-) Why do I get this for some genes:

> plotFeatures(sgfeatures,geneName="ABHD16B")
Error in hclustfun(distfun(t(x))) :
NA/NaN/Inf dans un appel à une fonction externe (argument 11)

Tanks in advance

S

ADD COMMENT • link 7.6 years ago Sylvain Foisy ▴ 80

score 0 · Answer 5 · 2016-10-05

0

Entering edit mode

Sylvain Foisy ▴ 80

@sylvain-foisy-5539

Last seen 4.6 years ago

Canada

Hi Leonard,

Exploring the plotFeatures method ;-) Why do I get this for some genes:

> plotFeatures(sgfeatures,geneName="ABHD16B")
Error in hclustfun(distfun(t(x))) :
NA/NaN/Inf dans un appel à une fonction externe (argument 11)

Thanks in advance

S

ADD COMMENT • link 7.6 years ago Sylvain Foisy ▴ 80

score 0 · Answer 6 · 2016-10-05

0

Entering edit mode

Leonard Goldstein ▴ 180

@leonard-goldstein-6845

Last seen 14 months ago

Australia

Hi Sylvain,

This is an error from the hierarchical clustering function which is used to reorder rows (samples) in the heatmap (one cause for this might be NA values in your data). You can suppress clustering by using argument Rowv = NA.

Thanks,

Leonard

ADD COMMENT • link 7.6 years ago Leonard Goldstein ▴ 180

0

Entering edit mode

Hi,

Cool; that did the trick ;-) BTW and this might sound like a stupid question: how come I end up with NA values? Too low a count for SGSeq to set a value? Is there a way to fix this in analyzeFeatures, like setting a minimal value?

Also another question about plotFeatures: I do see that most of my genes go from 0 to a max value, and that the ramp for the heat map dos from solid black to gold. How come I have some genes for which the ramp goes from 0 to 0 and the heat map is a solid pale gold everywhere?

Best regards

S

ADD REPLY • link 7.6 years ago Sylvain Foisy ▴ 80

score 0 · Answer 7 · 2016-10-05

Hi,

The data used for generating the heatmap (and clustering of samples) come from an assay in your SGFeatureCounts object. This is controlled with argument 'assay' (the default is "FPKM"). Before plotting and clustering, the data are transformed with a function specified in the 'transform' argument – the default is log2(x + 1). Colors used for the heatmap are determined with the 'col' argument. The range is determined from the data or you can specify it with 'zlim'. If there are no reads for the gene that you are plotting, the range of the data is limited to 0.

You shouldn't normally get NA values in the FPKM matrix, but the transformed values may be NA if you changed the function that is used for transformation. If you are working with an SGFeatureCounts object 'sgfc' you can inspect the (untransformed) FPKM values with e.g.

> FPKM(sgfc[any(geneName(sgfc) == "ABHD16B"), ])

Leonard