Question: Data structure for my dataset
0
2.4 years ago by
Sim1pall8a0 wrote:

Hi, I'm new at R/Bioconductor.

I need a data structure that represent the dataset that i'm working with.

My dataset is a set of samples: each sample is made up of two file (.gdm or .gtf)

1) set of region:

chr     left       right   strand name score signal    pvalue    qvalue  peak

chr1    237680    237830    *    .    0    17.1875    15.9472    -1    -1
chr1    521500    521650    *    .    0    15.5625    18.0962    -1    -1
chr1    714060    714210    *    .    0    139.40625  316.755    -1    -1

assay    DNase-seq
assembly    hg19
audit_internal_action    experiment not submitted to GEO, out of date analysis
biological_replicate(s)    2
biosample_life_stage    unknown
biosample_organism    Homo sapiens

(these example are taken from my files)

I know Granges and GrangesList are useful for representing a regions or list of regions respectively, but i don't know how to represent and bind my metadata to my regions.

is there any other package/class that could help me or I need to create one "ad hoc"?

modified 2.4 years ago by Michael Lawrence11k • written 2.4 years ago by Sim1pall8a0
Answer: Data structure for my dataset
0
2.4 years ago by
United States
Michael Lawrence11k wrote:

I think you could just store the metadata in a list as the metadata() component of the GRanges.

thanks, but in this way it create as many columns as the different value in my metadata and replicate the same value for every rows and this is not what i want.

and also I use metadata in GRanges for extra column of my region not cover in GRanges (name score signal    pvalue    qvalue  peak)

I would like something similar to metadata in summarizedExperiment package

metadata() on GRanges is the same as metadata() on SummarizedExperiment. Maybe you are getting confused with mcols()?

yes, I was confused :) , I didn't understand completely metadata in summarizedExperiment

I think i got it, thanks!