Throw a stone at me: structure software

Tuesday, August 13, 2013

STRUCTURE 프로그램은 unlinked marker (recombinant allele의 frequency가 50% 이면 unlinked, 곧 marker 간의 거리가 먼, 2.0 version 이후로는 weakly linked markder도 다룬다고 함) 의 genotype data를 가지고 model-based clustering method를 이용하여 population structure를 추정하는 프로그램. (http://pritch.bsd.uchicago.edu/structure.html). 이 posting 에서는 우선적으로 manual(http://pritch.bsd.uchicago.edu/structure_software/release_versions/v2.3.4/structure_doc.pdf) 내용을 기반으로 하고 가능하다면 논문(http://pritch.bsd.uchicago.edu/publications/structure.pdf) 도 cover 해보고자 한다

introduction으로 홍창범씨의 블로그의 예제 (http://hongiiv.tistory.com/610) 를 실행해보는 것이 좋다.

how to format the data files

맨 첫줄(underbar 위에 있는 것)은 이해를 돕고자 넣은 것이고 그 다음 line 부터가 structure의 input format이다.
아래는 row 1부터 row 별 설명이다.

how to choose appropriate models
Ancestry models

No admixture model : individuals are discretely from one population or another : 각 개인이 온전히 하나의 population 에서부터만 유래한 것 (population이 하나라는 의미가 아니라 각 개인의 genome이 여러 population 이 섞인 것이 아니라 딱 하나의 population 에서부터 왔다는 의미). 이 경우 individual i가 population k에 속할 posterior probability (P(model|data))를 report하게 된다.
Admixture model : each individual draws some fraction of his/her genome from each of the K populations : 각 개인의 genome이 여러 population이 섞인 경우.
Linkage model : like the admixture model, but linked loci are more likely to come from the same population :
Using prior population information

Allele frequency models

how to interpret the results

how to estimate of K (the number of populations)

Throw a stone at me