Throw a stone at me: August 2010

Sunday, August 29, 2010

Genome-Wide Evolutionary Analysis of Eukaryotic DNA Methylation

I had an idea about evolution of methylation in these days from the fact that methylation pattern is conserved in othologous region between species. I decided to dig about this concept, so I did googling first. I had scarcely searched the google when the title of this post was appeared.

This paper which is published in SCIENCE.

Here is the PPT.

this link is also good to read.
http://blog.lib.umn.edu/denis036/thisweekinevolution/2010/05/evolution_of_dna_methylation_i.html

(열혈강의) 오용철의 데이터베이스 모델링

학부때 unigene 데이터를 다루면서 sql을 공부하고 이용해봤지만 데이터베이스라는 과목의 체계적인 컨셉이 부족하다고 생각하여 본 책. 물론 아직 뒤 두과 정도 (상향식 설계, 통합적 설계) 남았지만 미리 리뷰를 해보련다.

이책의 느낌 마치 내가 대학교 2학년때 컴퓨터 학부에 가서 자료구조를 들었던 느낌? 이랄까.. 다 읽어 보면 나름 편안하게 설명하고 있다는걸 느끼게 되지만 도입부의 설명의 적극성과 친근성이 떨어져서 아무것도 모르는 초짜에겐 아마도 지루함과 "왜 "라는 의문이 들 책이다.

나와 같은 데이터베이스를 아주 약간 알지만 정리를 해보고 싶은 사람에겐 쉽게 읽을 수 있는 아주 편한 책이나 정말 아무것도 모르는 이에게는 비추인 책이다.

간단하게 내용을 정리하자면 오른쪽 그림과 같다.
1.데이터베이스화 하고자 하는 세계를 데이터수집과 분석을 거쳐 정리하고
2.이를 먼저 개념적 설계과정을 거쳐 ER model (diagram)을 만든다.
3.그 뒤 논리적 설계과정(하향식, 상향식,통합식) 구현 데이터 모델을 만든다(이 책에서는 관계형 모델을 설명한다).
4.마지막으로 물리적 설계과정을 거쳐 실질적인 물리적 모델을 만든다.

각 단계별 설명과 실직적인 예가 있으며 책에서 담고 있는 내가 몰랐던 중요한 키워드를 꼽자면 정규화, 인덱스, PL/SQL, 트리거, 커서 등이다.

마지막으로 아쉬운 점을 꼽자면 figure에 오타가 많고 각 단계별 schema(개념적, 논리적, 물리적 스키마)에서 같은 개념 대한 서로 다른 용어를 혼란스럽게 사용한다는 점을 들 수 있겟다.

Monday, August 23, 2010

homologous recombination

우선 이번 포스팅은 코리안으로 하겠다.

오늘 science 잡지에 "re-replication may be a contributor to gene copy number changes" 라는 제목으로 논문이 실렸다. 그 메커니즘은 오른쪽과 같다. NAHR(non-allelic homologous recombination)에 의해 gene copy number 가 달라진다는 내용인듯하다 (그림만 보고 읽진 않았다). 이 그림을 보고 NAHR이 무엇인가를 찾아보게 되었다. 다름 아니라 하나의 allele에서 일어나는 HR(homologous recombination).
그렇다면 HR은 무엇인가? recombination은 재조합으로 예전 생물학 시간에 들었던 것이 얼핏 기억이 난다. 우선 두군데에서 정보를 찾았다.

sanger 와 wikipedia.
생거에서 말하는 정보는 매우 적고 좀더 자세히나온 위키 피디아를 본다.핵심 적인 내용(왼쪽 그림)은 double strand break repairs 과정의 하나의 pathway인 DSBR pathway에 의해 HR이 생기고 그 결과 crossover 내지는 gene conversion이 생긴다는 것이다.
double Holliday junctions이 nicking endonuclease에 의해 horizontal resolution(sanger site에서의 표현을 빌리자면) 에 의하면 gene conversion이 일어나고 vertical resolution에 의해 cross-over가 일어난다.

문제는 일반적으로 gene conversion을 찾아보면 그림이 오른쪽과 같은데 위의 과정에는 gene conversion이 일어나면 한쪽 duplex에는 다른 한쪽의 duplex에서 온 dna 조각이 double strand로 통으로 들어가는게 맞지만 gene conversion이 일어나지 않은 duplex에도 one strand로 다른 쪽의 dna 조각이 들어가게 되는데 오른쪽의 그림에서는 한쪽은 전혀 dna 가닥이 섞이지 않게 표현되어 있다. 이는 내가 잘못 이해한것인가 아니면 편의상 그림을 오른쪽과 같이 그린것인가?

anyone can answer this problem????

darwin's evolution theory

Several months ago, I was fascinated with evolution theory. So I found many documents and web sites for a week and tried to read most of them. But because whether my limitation on interpreting or lack of detailed explanation, my understanding to evolution theory is not clear at that time.
I found this book on last week by chance and read it on last weekend. Through this book I can do arrange my thinking about evolution theory and its' history.
Especially a section which was written by Motoo Kimura thirty years ago who make neutral theory of molecular evolution was really impressive. I read that part as if I heard lecture from Kimura.

I believe that ALL analysis of biological data is based on evolution theory and the biologists who don't have this concept are scientists without spirit.
As mentioned in Darwin 2.0 (the book? which is written by professor in Ewha univ.), Darwin's theory is the one which can be considered as the greatest theory in biology and it's greatness can be comparable with theory of relativity from physics.

I can found theory about evolution and meaning of population genetics and application of evolution. Of course although these are not that concrete, it's enough to set the basic concept of these things.

Wednesday, August 18, 2010

SAM (sequence alignment/map format)

With advent of NGS, there are some projects such as 1000 genome project which can be possible by the technology. There are several sequencing machines and tools for alignment, this cause confusion and trivial handling works of result data for downstream analysis. In my firm, we also felt this problem when members use different tool for mapping the reads. While we tried to find or decide a formal format, I found SAM, so I planed to introduce this to members in my firm.
what a pity! We found this after one year passed from when it was made. In these days I am disappointed a lot to people in firm and my stuffiness. I am really sick of the fact that what I can do is just following someone's work. I really hope to be a leader in front of development of science.

Anyway,

Because I am not familiar with binary format and compression file, I just skipped that part of the format specification.
Actually the ppt here is just brief introduction.
Here is the PPT.

list of presentation,
1.The sequence alignment/map format and SAMtools
2.Sequence Alignment/map format document

<sam flag explanation>
http://picard.sourceforge.net/explain-flags.html

<pysam 사용시 주의점>
pysam 에서는 mapping pos가 0 based 이다. 그래서 fetch를 할때 유전자의 정보가 bed 파일이 아닌이상 start position에 -1 값으로 fetch 를 해야 한다.

Tuesday, August 17, 2010

gzip compression algorithm & Burrows-Wheeler Transform

I often see some documents which mention about compression and indexing such as bowtie, BGZF..
This is why I prepare this post. ah.. just link to reference.

Gzip compression algorithm
http://dalmasian.tistory.com/46

overall short introduction about compression algorithm
http://blog.naver.com/altools?Redirect=Log&logNo=150019572403

coding zip by using zlib
http://blog.naver.com/ksw7998?Redirect=Log&logNo=100011414029

Burrows-Wheeler Transform (bzip2)
http://james.fabpedigree.com/bwt.htm

Saturday, August 7, 2010

overview of discovering structural variation with NGS

I think this presentation will be the last of three consecutive presentation in my firm.

so far, I have reviewed ChIP-seq, RNA-seq, de novo assembly (I didn't do posting of this subject, but I already made my own pipeline). I expect that after finishing this posting I can look over overall utilization of NGS, of course, I know this conclusion should be arrogant.

In my plan, these papers below will be introduced in presentation.

1. Computational methods for discovering structural variation with next-generation sequencing
2. one of the paper which is referred in paper 1.

I decide the second paper for presentation. that is beakDancer "BreakDancer: an algorithm for high-resolution mapping of genomic structural variation". haha Isn't it fascinated? Their sense for naming.. anyway It will come soon.

Wednesday, August 4, 2010

RNA-seq analysis overview

after finishing ChIP-seq analysis overview, for next presentation, I will prepare mRNA-seq. this consists of one review paper and two articles. I will be back

the list of papers which are presented in this post.

1. Computation for ChIP-seq and RNA-seq studies
2. Mapping and quantifying mammalian transcriptome by RNA-seq
3. Dynamic transcriptomes during neural differentiation of human embryonic stem cells reveals by short, long, and paired-end sequencing

I finally made PPT.

ChIP-seq analysis overview

I already reviewed some papers and set up the pipeline for analysis of ChIP-Seq.
I just hope this website for uploading ppt or other file.

anyway I will post this later

I found that. keke I am an idiot. just use hyperlink. that's all. no.no... for link that should be in web. one of way to solve this problem is use google site. upload file in google site and link that web site in this blog. I don't why blogger didn't make button for uploading data. or I couldn't find the button.

anyway.

Here is PPT for this posting.

The list of papers which are presented in this ppt.

1. Computation for ChIP-seq and RNA-seq studies
2. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls