Entering edit mode
Yoo, Seungyeul
▴
110
@yoo-seungyeul-5323
Last seen 10.2 years ago
Dear all,
I'm working on a DNA Methylation microarray dataset. The microarray
design is "pd.feinberg.hg18.me.hx1".
I used the CHARM package to estimate methylation percentile and
selected 1000 probes having larger variances of methylation level
across samples.
The 1000 probe are identified as chromosome coordinate like following.
> rnames[1:10]
[1] "chr1:1707145" "chr1:2148663" "chr1:3133683" "chr1:3180808"
"chr1:3294081"
[6] "chr1:3470900" "chr1:3470969" "chr1:3633816" "chr1:3676205"
"chr1:3720637"
Now I want to see the gene expression of these 1000 probes and see the
correlation between gene expression and dna methylation.
I loaded human genome transcript information from UCSC and extracted
features of all transcripts like followings.
hg18KG<-loadFeatures("hg18_UCSC.sqlite")
tbl_tx<-select(hg18KG,keys(hg18KG,"GENEID"),cols=c("GENEID","TXNAME","
TXCHROM","TXSTRAND","TXSTART","TXEND"),keytype="GENEID")
> tbl_tx[1:10,]
GENEID TXNAME TXCHROM TXSTRAND TXSTART TXEND
1 1 uc002qsd.2 chr19 - 63549984 63556677
2 1 uc002qsf.1 chr19 - 63551644 63565932
3 10 uc003wyw.1 chr8 + 18293035 18303003
4 10 uc010lte.1 chr8 + 18301794 18302666
5 100 uc002xmj.1 chr20 - 42681577 42713790
6 100 uc010ggt.1 chr20 - 42681577 42713790
7 1000 uc002kwg.1 chr18 - 23784933 24011189
8 10000 uc001iaa.2 chr1 - 241731689 241733518
9 10000 uc001hzz.1 chr1 - 241718158 242073207
10 10000 uc001iab.1 chr1 - 241733107 242073207
For each of 1000 probes, I want to find the closest transcript
starting point (TXSTART).
But I don't know how to treat strand. There was no strand information
provided from raw data but transcripts have strand information (either
"+" or "-").
How I can calculate distance from probe coordinate to transcript
starting point which is on strand "+" or "-"?
Can I just ignore "+" or "-" which allows me to treat +111111 and
-111111 in the same way? My guess they should be different because
genome sequence shouldn't be symmetric.
I just started to join genomics field from different area and have
little experience working on genome sequences. Sorry for my naive
question.
But any comments about this, even conceptual ones, would be very
helpful for me.
Thank you.
Seungyeul Yoo
Postdoctoral Fellow
Institute of Genomics and Multiscale Biology
Department of Genetics and Genomic Sciences
Mount Sinai School of Medicine
(office) 212-659-6877