SGSeq: moving toward diffex from SGSeq analysis
3
3
Entering edit mode
@sylvain-foisy-5539
Last seen 5.2 years ago
Canada

Hi,

I have use SGSeq mostly to look at exon usage for genes of interest in my RNASeq datasets. I would like to move toward exon diffex analysis. I followed the recipes in the HTML guide to get counts, exons and gene ID but I am perplexed about what I got... I have an SGFeatureCounts object with created from rowRanges on the SGVariantCounts object previously created that has all the infos like geneName, etc. However, after performing the getSGVariantCounts method on it and using the eventID method, I only get integers, that I am guessing are the positions of the genes in the previous object. How can I get geneName instead or possibly add geneName data on the objects?

I am guessing that it would always be possible to go back with integer values to explore the SGFeatureCounts object?

Thanks in advance

S

sgseq rnaseq • 2.3k views
ADD COMMENT
2
Entering edit mode
@leonard-goldstein-6845
Last seen 5 months ago
Australia

Hi Sylvain, 

Not sure I completely follow your question. You seem to be confusing SGFeatureCounts and SGVariantCounts objects. The idea behind section 11 of the vignette is to test for differential splice variant usage within a splice event, rather than differential exon usage within a gene. This approach will detect differential splice events (which are identified by event IDs) and you can look them up in your original SGVariants/Counts object. 

Leonard

ADD COMMENT
0
Entering edit mode

Hi Leonard,

No problem ;-) Ok, as per section 11, I created objects for counts, exon ID and gene ID with the methods in the vignette:

# My R stuff:
# sgfc.denovo.var is my equivalent to sgvc_pred
> sgv.ranges<-rowRanges(sgfc.denovo.var)
> sgv.counts<-getSGVariantCounts(sgv.ranges,sample_info=data.r)
> x <-counts(sgv.counts)
> vid<-variantID(sgv.counts> 
> eid<-eventID(sgv.counts)

When I look at vid or eid, I only get matrices of integer values, so I am guessing that they are corresponding indexes in the previous  sgv.ranges object. Am I right? If this is the case, what is the easiest way to make out the gene symbol from the geneName column in sgv.ranges?

Thanks in advance

S

ADD REPLY
3
Entering edit mode

Hi Sylvain, 

Yes variantID() and eventID() are accessor functions for metadata columns in the SGVariant/Counts objects. You can see all available columns by typing mcols(sgv.counts). To extract gene names you can simply use geneNames(sgv.counts). Note this will return an empty CharacterList unless you previously annotated your predictions. I hope this helps. 

Leonard

ADD REPLY
0
Entering edit mode

Hi Leonard,

Thanks for the info; I'll be playing some more and if needed, may be bug you again ;-)

Best regards

S

ADD REPLY
0
Entering edit mode

Hi,

I was following the SGSeq tutorial, but there were some things that I didn't understand properly. I would appreciate further explanation, please.

As following the tutorial document of SGSeq for section 11, I created the SGVariantCounts object:

sgv <- rowRanges(sgvc_pred)
sgvc <- getSGVariantCounts(sgv, sample_info = si)
sgvc


x <- counts(sgvc)
vid <- variantID(sgvc)
> vid
[1] 1 2
# vid output is only an integer. Could you explain what these numbers are? as well as for eid?

eid <- eventID(sgvc)
> eid
[1] 1 1

Then, the plan was to use DEXSeq for differential splice variant usage.

What information do I have to extract and use as input for differential splicing analysis using DEXSeq? How do I build my DEXSeqDataSet object for further analysis?

In the documentation of DEXSeq they used DEXSeqDataSetFromSE to build DEXSeqDataSet object as follow:

dxd = DEXSeqDataSetFromSE( se, design= ~ sample + exon + condition:exon )

Now, as I used SGSeq; what do I have to use to create the DEXSeqDataSet object? Should I use the DEXSeqDataSetFromSE function, and replace see with x which is the counts(sgvc)? or there is another way? How I have to make my design? use vid from SGSeq instead of exon in the dxd from DEXSeq?

I suppose I have to use my SGVariantCounts objects in my SGSeq output as an input in DEXseq like this, but I am not sure:

dxd = DEXSeqDataSet(countData = x, featureID = vid, groupID = eid, design=  ~ sample + exon + condition:exon)

but then here how to define sample and exon?! or where to get these info from my SGSeq analysis?

I would appreciate any help!

Many thanks in advance!

ADD REPLY
2
Entering edit mode
@leonard-goldstein-6845
Last seen 5 months ago
Australia

sgvc is a SummarizedExperiment object and mcols(sgvc) is a DataFrame. The error happens because some of the columns are lists e.g. TxName. You can either remove these columns before you export to CSV or coerce them into character strings e.g. using S4Vectors::unstrsplit(x, sep = "|")

For the last question, columns geneName and txName contain gene and transcript identifiers. In the example from the vignette, gene identifiers are Entrez IDs. You could use gene symbols in the annotation or alternatively map the Entrez IDs in your results to gene symbols.

ADD COMMENT
0
Entering edit mode

Thank you so much for your kind help. I got this error from predictVariantEffects function for my own data: what does this error mean, and how do I solve it? Many thanks again!

enter image description here

vep <- predictVariantEffects(sgv_pred, txdb, Hsapiens)
ADD REPLY
0
Entering edit mode

Looks like sgv_pred does not have the expected format. Was it created or edited manually? I would always recommend creating objects with the package functions as shown in Fig 1b of the vignette.

ADD REPLY
0
Entering edit mode

Yes, thank you. I have another error now, may you please guide me. here is the code that I used till using vep:

#
bam_path  <- "~/path to the bam files"
bam_files <- list.files(path = bam_path, pattern = "\\.bam$", full.names = TRUE)
si <- data.frame(sample_name = basename(bam_files), file_bam = bam_files)

si_complete <- getBamInfo(si)

#3) RNA transcripts and the TxFeatures class
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene

txdb <- keepSeqlevels(txdb, c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX", "chrY", "chrM"))
seqlevelsStyle(txdb) <- "UCSC"


txf_ucsc <- convertToTxFeatures(txdb)



seqlevels_in_bam <- seqlevels(BamFile(si_complete$file_bam[1]))
seqlevels_in_bam <- seqlevels_in_bam[seqlevels_in_bam != "chrEBV"]
common_seqlevels  <- intersect(seqlevels_in_bam, seqlevels(txf_ucsc))

txf_ucsc_filtered <- keepSeqlevels(txf_ucsc, common_seqlevels)



#4) The splice graph and the SGFeatures class
sgf_ucsc <- convertToSGFeatures(txf_ucsc_filtered)


#5) Splice graph analysis based on annotated transcripts
sgfc_ucsc <- analyzeFeatures(si_complete, features = txf_ucsc_filtered, cores=8)


#
sgfc_pred <- annotate(sgfc_ucsc, txf_ucsc_filtered)


# Splice variant identification
sgvc_pred <- analyzeVariants(sgfc_pred)
# Splice variant interpretation
vep <- predictVariantEffects(sgvc_pred, txdb, Hsapiens)

VepError

ADD REPLY
0
Entering edit mode
@leonard-goldstein-6845
Last seen 5 months ago
Australia

vid and eid are unique identifiers for splice variants and splicing events respectively. They are analogous to exon bin and gene identifiers. Read counts for splicing analysis can be accessed with counts(sgvc). Your code for creating a DEXSeqDataSet looks correct. sgvc is a SummarizedExperiment object, so you can try using DEXSeqDataSetFromSE() or otherwise DEXSeqDataSet() should work. For questions on the design formula please check the DEXSeq tutorial. I believe sample and condition have to be columns in the colData slot.

ADD COMMENT
0
Entering edit mode

Thank you for your response. I have another question about saving the sgvc result as a CSV file. I would appreciate your help, please.

sgvc <- getSGVariantCounts(sgv, sample_info = si)

> sgvc
class: SGVariantCounts 
dim: 2 8 
metadata(0):
assays(6): countsVariant5p countsVariant3p ...
  countsVariant5pOr3p variantFreq
rownames: NULL
rowData names(20): from to ... variantType variantName
colnames(8): N1 N2 ... T3 T4
colData names(6): sample_name file_bam ... frag_length
  lib_size

> class(sgvc)
[1] "SGVariantCounts"
attr(,"package")
[1] "SGSeq"

> head(mcols(sgvc))
DataFrame with 2 rows and 20 columns
             from              to        type   featureID
      <character>     <character> <character> <character>
1 D:16:87393901:- A:16:87380856:-           J          28
2 D:16:87393901:- A:16:87380856:-         JEJ    32,30,27
    segmentID  closed5p  closed3p closed5pEvent closed3pEvent
  <character> <logical> <logical>     <logical>     <logical>
1           4      TRUE      TRUE          TRUE          TRUE
2           2      TRUE      TRUE          TRUE          TRUE
     geneID   eventID variantID   featureID5p   featureID3p
  <integer> <integer> <integer> <IntegerList> <IntegerList>
1         1         1         1            28            28
2         1         1         2            32            27
  featureID5pEvent featureID3pEvent
     <IntegerList>    <IntegerList>
1            28,32            28,27
2            28,32            28,27
                            txName        geneName     variantType
                   <CharacterList> <CharacterList> <CharacterList>
1 uc002fjv.3,uc002fjw.3,uc010vot.2           79791            SE:S
2                                            79791            SE:I
     variantName
     <character>
1 79791_1_1/2_SE
2 79791_1_2/2_SE

how can I save sgvc results in a csv file?

because when I am receiving this error and from previous question on bioconductor still I couldn't find how to save it as a csv file

write.csv(mcols(sgvc), "sgvc.csv", quote=FALSE)
Error in utils::write.table(mcols(sgvc), "sgvc.csv", quote = FALSE, col.names = NA,  : 
  unimplemented type 'list' in 'EncodeElement'

Even if I try to convert it to data.frame:

sgvc.df <- as.data.frame(mcols(sgvc))

write.csv(sgvc.df, "sgvc.csv", quote=FALSE)
Error in utils::write.table(sgvc.df, "sgvc.csv", quote = FALSE, col.names = NA,  : 
  unimplemented type 'list' in 'EncodeElement'

And because we use variantID and eventID of sgvc, how do we check/extract which gene names they belong to, especially when using DEXSeq?

Thank you in advance!

ADD REPLY

Login before adding your answer.

Traffic: 407 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6