Wednesday, August 18, 2010

SAM (sequence alignment/map format)

With advent of NGS, there are some projects such as 1000 genome project which can be possible by the technology. There are several  sequencing machines and tools for alignment, this cause confusion and trivial handling works of result data for downstream analysis. In my firm, we also felt this problem when members use different tool for mapping the reads. While we tried to find or decide a formal format, I found SAM, so I planed to introduce this to members in my firm.
what a pity! We found this after one year passed from when it was made. In these days I am disappointed a lot to people in firm and my stuffiness. I am really sick of the fact that what I can do is just following someone's work. I really hope to be a leader in front of development of science.


Anyway,


Because I am not familiar with binary format and compression file, I just skipped that part of the format specification.
Actually the ppt here is just brief introduction.
Here is the PPT.


list of presentation,
1.The sequence alignment/map format and SAMtools
2.Sequence Alignment/map format document 






<sam flag explanation> 
http://picard.sourceforge.net/explain-flags.html








<pysam 사용시 주의점>
pysam 에서는 mapping pos가 0 based 이다. 그래서 fetch를 할때 유전자의 정보가 bed 파일이 아닌이상 start position에 -1 값으로 fetch 를 해야 한다.