Dendogram from read count to show correlation between biological replicate and samples
1
0
Entering edit mode
Jitendra ▴ 10
@nabiyogesh-11718
Last seen 13 days ago
United Kingdom
I have a read count for 34 samples including biological replicate. I need to make dendogram from read count to show correlation between biological replicates. Please share biocundutor packgae name and code. Do I need to do normalization before dendogram construction? Thanks
bioconductor • 858 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 41 minutes ago
United States

You can find an example of an RNA-Seq analysis here.

ADD COMMENT
0
Entering edit mode

I am trying following code to make dendogram based on raw read count but getting some error:

nonzero_row <- mydatanew[rowSums(mydatanew) > 0, ] # removed 0 read count across the the all column


> dim(nonzero_row)


[1] 48538    33


> str(nonzero_row)


'data.frame':   48538 obs. of  33 variables:

 $ 216_5W_Ca1: int  100 0 0 8 285 0 253 0 0 339 ...
 $ 216_5W_Ca2: int  71 0 0 48 258 0 204 0 0 484 

x1= as.matrix(nonzero_row)  # converted x into matrix
>  x=log2(x1+1)                  # transfrom read count into log value
> head(x)

 

  

> d <- dist(x, method="euclidean")


> h <- hclust(d, method="complete") 

Error:

 *** caught segfault ***            
address 0x7f8ca9becf28, cause 'memory not mapped'

Traceback:
 1: hclust(d, method = "complete")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

Thanks

 

ADD REPLY
0
Entering edit mode

Can anyone suggest why I am getting the error in above code?

ADD REPLY
0
Entering edit mode

R shouldn't segfault; can you add the output of sessionInfo() to your post? Here's mine

> sessionInfo()
R version 3.4.0 Patched (2017-05-24 r72729)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /home/mtmorgan/bin/R-3-4-branch/lib/libRblas.so
LAPACK: /home/mtmorgan/bin/R-3-4-branch/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.0

It seems that you have made a very large distance matrix, with 

> 48538 * (48538 - 1) / 2
[1] 1177944453

elements. Likely this is causing an overflow in R's internal code. But it doesn't really make sense to cluster across all genes, many of which will have extremely small influence on the result. Instead filter based on appropriate criteria, e.g., selecting the most variable (matrixStats::rowVars()) or more sophisticated.

ADD REPLY

Login before adding your answer.

Traffic: 1034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6