How to call "DBA_CONDITION" for mutiple conditions and How to control GLM program
0
0
Entering edit mode
arcucamila • 0
@arcucamila-14873
Last seen 12 months ago
Argentina

Hi everybody!

I'm starting to introduce myself into the world of differential affinity analysis, but I have a series of problems (please, forgive me my english):

I'm interested in doing a differential afinity  analysis for multiple samples of a H3K27ac ChIP cancer's cells. I have many cell lines and also each sample was taken from different labs, so samples from a particular cell line are no replicates extrictly and I don't have a real replicate for each sample (let's say some replicates were not good). Skiping this for a moment , my fist question is to identify diferencial activity in peaks between diferent cells subtype, such as: Luminal, Basal A, Basal B. So, going on, I think it could be a possiblilty to make a table, as the one I'm showing below, where I use this cells subtype as my condition so then  I will only have one sample of each condition with many "replicates" that are all the true samples that fit on that condition. I don't have a clear decision of this. Maybe I could mix the labs and use the labs as replicate for each cell line. By the way, I'm interested on the peaks, not too much on the cells lines.

Well, I know that this may sound a little weird, but follow me couse my problem appears later.

So this is the table I made:

SampleID Tissue Factor Condition bamReads ControlID bamControl Peaks PeakCaller
BT549-Schor BT549 H27K27Ac BaB /home/bam/BT549_Young BT549-Young-c /home/bamC/BT549_Young /home/bed/BT549_Young bed
BT549-Young BT549 H27K27Ac BaB /home/bam/BT549_Young BT549-Young-c /home/bamC/BT549_Young /home/bed/BT549_Young bed
HCC1569 HCC1569 H27K27Ac BaA /home/bam/HCC1569 HCC1569-c /home/bamC/HCC1569 /home/bed/HCC1569 bed
HCC1569-Schor HCC1569 H27K27Ac BaA /home/bam/HCC1569 HCC1569-c /home/bamC/HCC1569 /home/bed/HCC1569 bed
MDAMB231-Arc MDAMB231 H27K27Ac BaB /home/bam/MDAMB231_Hardy MDAMB231-Hardy.merged-c /home/bamC/MDAMB231_Hardy /home/bed/MDAMB231_Hardy bed
MDAMB231-Hardy.merged MDAMB231 H27K27Ac BaB /home/bam/MDAMB231_Hardy MDAMB231-Hardy.merged-c /home/bamC/MDAMB231_Hardy /home/bed/MDAMB231_Hardy bed
MDAMB468-Arc MDAMB468 H27K27Ac BaA /home/bam/MDAMB468_Young MDAMB468-Young-c /home/bamC/MDAMB468_Young /home/bed/MDAMB468_Young bed
MDAMB468-Young MDAMB468 H27K27Ac BaA /home/bam/MDAMB468_Young MDAMB468-Young-c /home/bamC/MDAMB468_Young /home/bed/MDAMB468_Young bed
MCF7-Hardy MCF7 H27K27Ac Lu /home/bam/MCF7_Hardy MCF7-Hardy-c /home/bamC/MCF7_Hardy

/home/bed/MCF7_Hardy

 
MCF7-Schor MCF7 H27K27Ac Lu /home/bam/MCF7_Schor MCF7-Schor-c /home/bamC/MCF7_Schor

/home/bed/MCF7_Schor

 

 

And this is the script I made until now (note that I have used a short table just to taste the readablility of it):

> GeneClusterDBA<- dba(sampleSheet="lines.name-type7.csv")
MDAMB468-Young MDAMB468 H27K27Ac BaA  NA bed
HCC1569 HCC1569 H27K27Ac BaA  NA bed
MDAMB231-Hardy.merged MDAMB231 H27K27Ac BaB  NA bed
BT549-Young BT549 H27K27Ac BaB  NA bed

....
> GeneClusterDBA

4 Samples, 22384 sites in matrix (58575 total):
                     ID   Tissue   Factor Condition Caller Intervals
1        MDAMB468-Young MDAMB468 H27K27Ac       BaA    bed     34448
2               HCC1569  HCC1569 H27K27Ac       BaA    bed     32683
3 MDAMB231-Hardy.merged MDAMB231 H27K27Ac       BaB    bed      3091
4           BT549-Young    BT549 H27K27Ac       BaB    bed     30053

....
> GeneClusterDBA<-dba.count(GeneClusterDBA)
> GeneClusterDBA
4 Samples, 22384 sites in matrix:
                     ID   Tissue   Factor Condition Caller Intervals
1        MDAMB468-Young MDAMB468 H27K27Ac       BaA counts     22384
2               HCC1569  HCC1569 H27K27Ac       BaA counts     22384
3 MDAMB231-Hardy.merged MDAMB231 H27K27Ac       BaB counts     22384
4           BT549-Young    BT549 H27K27Ac       BaB counts     22384
  FRiP
1 0.37
2 0.44
3 0.09
4 0.43

.....

First, I don't know if the NA's that appears when the program reads de .csv are normal..Maybe there is something the program can not read.

Second, I also don't know if I should have named the column of the conditions "DBA_CONDITION" intead of "condition" so I could have called dba,contrats like this with out get the warning of: "No contrast have been made.." or something like that..

How Can I use "DBA_CONDITION'?

GeneClusterDBA<-dba.contrast(GeneClusterDBA, categories=DBA_CONDITION, block=...)

The three dots besides "block=" means that I think I should use a blocking factor but I don't understand very well how to use it. If I am interested to compare Lu with BaB and BaA, then BaB with Lu and BaA, and then BaA with Lu and BaB, Do I have to do 3 analysis? First blocking Lu, the BaB and finally BaA? or maybe Can I do it in the same analysis? Can I made all the possible comparisons in the same analysis?

Also using "block"  I don't know If I have to specify any particular column name or somthing like that. How does the program know that I want to block this o that factor? Should I use a mask? Is that the only possibility?

Should I also have to specify a column of replicates using as replicates the first possiblity that I said before(use one sample for each condition)? Will the program understands them despite the fact that the Tissue column have diferent names but many of the represent the same  condition..?

Also I want to know from where can i read more about edgeR and DESeq2? I want to understand them more and maybe get how to change parameters of them.

Sorry for all my basics questions, I'm really lost in this stuff.

I will appreciate very much your help

Thank you in advance

 

Camila

cancer deseq2 edger CONDITIONS DBA • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6