Binning expression data along with capturing gene identifiers
1
0
Entering edit mode
Alan Smith ▴ 150
@alan-smith-5987
Last seen 7.7 years ago
United States

Hello,

I'm using cut function to bin my expression data (FPKM) to know the
distribution of expression in different bins and plot them. However, it
won't capture gene identifiers of genes (row names) in different bins. I
wonder if there is any function in one of the Bioconductor packages / any
code to do so.

I use this for binning:
fpkmBins <-
table(cut(fpkm[,2],breaks=c(0,0.01,1,2,3,4,5,10,20,30,40,50,60,70,80,90,100,12000),dig.lab=5,include.lowest=TRUE,labels=c("0","<1","1-2","2-3","3-4","4-5","5-10","10-20","20-30","30-40","40-50","50-60","60-70","70-80","80-90","90-100",">100")))

Thanks in advance for your help.
Alan


sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] GenomicFeatures_1.18.3 AnnotationDbi_1.28.1   Biobase_2.26.0
[4] GenomicRanges_1.18.4   GenomeInfoDb_1.2.4     IRanges_2.0.1
[7] S4Vectors_0.4.0        BiocGenerics_0.12.1

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.8
 [4] BiocParallel_1.0.0      biomaRt_2.22.0          Biostrings_2.34.1
 [7] bitops_1.0-6            brew_1.0-6              checkmate_1.5.1
[10] codetools_0.2-9         DBI_0.3.1               digest_0.6.8
[13] fail_1.2                foreach_1.4.2           GenomicAlignments_1.2.1
[16] iterators_1.0.7         RCurl_1.95-4.5          Rsamtools_1.18.2
[19] RSQLite_1.0.0           rtracklayer_1.26.2      sendmailR_1.2-1
[22] stringr_0.6.2           tools_3.1.2             XML_3.98-1.1
[25] XVector_0.6.0           zlibbioc_1.12.0

rnaseqdata clustering • 1.7k views
ADD COMMENT
1
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 10 weeks ago
Italy

Hi Alan,

you could use the factor generated by cut to split the rownames of your data.frame accordingly:

Cut.fpkm <- cut(fpkm[,2],breaks=c(0,0.01,1,2,3,4,5,10,20,30,40,50,60,70,80,90,100,12000),dig.lab=5,include.lowest=TRUE,labels=c("0","<1","1-2","2-3","3-4","4-5","5-10","10-20","20-30","30-40","40-50","50-60","60-70","70-80","80-90","90-100",">100"))

Genes.bin <- split( rownames( fpkm ), Cut.fpkm )

fpkmBins <- table( Cut.fpkm )

The Genes.bin should be a list with the same number of elements than your breaks and each element in the list being a character vector with the gene identifiers of the rows falling into the bin.

Hope this helps,

cheers, jo

 

 

ADD COMMENT
0
Entering edit mode

Oh !!! Knew that I must be missing something basic.

Thanks a bunch Jo.

ADD REPLY

Login before adding your answer.

Traffic: 667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6