Wednesday, June 30, 2010

The Zymomonas mobilis regulator hfq contributes to tolerance against multiple lignocellulosic pretreatment inhibitors

I will summarize article and make other sources come together for understanding and preparing our own paper.


Shihui Yang, Steven D Brown*

*They are doing research in Oak Ridge National Laboratory(http://www.esd.ornl.gov/). They are also studying on zymomonas mobilis. They published brief paper on Nature in last year about new annotation of
zymomonas mobilis and they announced genome sequence of AcR (acetate tolerant strain, but in reality it looks like it is tolerant on sodium) on PNAS.


In previous paper on PNAS(http://www.pnas.org/content/107/23/10395.full) they compared genome of AcR with ZM4 and found 1.5kb deletion that truncated ZMO0117 and DNA upstream of ZMO0119 (nhaA ; sodium proton antipoter) in AcR. They thought that the ZMO0117 promoter affected the expression of nhaA through deletion and this caused sodium acetate tolerance.

-background-

1.Demand for engineering of microbe : alternative energy is in need -> using agricultural biofuel, lignocellulosic biomass which is composed of cellulose is one method -> for fermentation by microbe, pretreament of biomass, breaking cellulose down into smaller molecule like 5- or 6-carbon sugar, is needed (http://biotech.about.com/b/2008/06/11/pretreatment-of-cellulosic-biomass.htm) -> this pretreatment produce inhibitor for microbe -> improved strain which is tolerant on these
inhibitors is developing by mutation.

2. Z.mobilis (Zymomonas mobilis) : ethanol tolerance, virtually unique
property among bacteria, 3~5 times higher productivity than S.cerevisiae and ethanol yield reaching 97% of theoretical maximum (http://www.nature.com/nbt/journal/v23/n1/full/nbt0105-40.html).
It use Entner-Doudoroff pathway for fermentation of glucose (6-C sugar, although some improved strain also use 5-C sugar). This pathway yield 1 ATP from conversion of 1 glucose into 2 ethanol, whereas glycolysis yield 2 ATP. Because Low ATP yield means low cell mass, Z.mobilis have higher potential than S.cerevisiae.







-the aim of this study-

Investigation the role of a hfq gene (ZMO0347) on multiple pretreatment inhibitor tolerances. htq is expressed more intensively in anaerobic stationary phase than aerobic condition (This fact was revealed by same author in BMC Genomics, http://www.biomedcentral.com/1471-2164/10/34). This gene is global regulator that acts as an R
NA chaperone and is involved in coordinating regulatory responses to multiple stresses.

There are some others focusing work such as utilization of specific plasmid and role of LSM protein in S.cerevisiae in this paper. But I will omit these things.

-results-

Using by blastP, they find hfq in ZM4 is similar with E.coli global regulator Hfq protein and Sm protein in S.cerevisiae. ---> An interesting thing is there is two Sm-like domain in ZM4's hfq.

They made AcRIM0347 by introduction of hfq insertion muation in AcR (Z.mobilis acetate tolerant strain).This . And they introduced plasmid p42-0347 (expressing hfq) into ZM4, AcR and AcRIM0347. ---> these can be specified as ZM4(p42-0347), AcR(p42-0347), AcRIM0347(p42-0347).







They tested growth of those above in various acetate counter-ions (NaCl, NaAc, NH4OAc, KAc) and in pretreatment inhibitors (vanillin,
furfural, HMF). ---> AcRIM0347
grow slowly than AcR, ZM4(p42-
0347) was able to grow in NaAc like AcR. AcRIM0347(p42-0347) recover growth to a certain degree in acetate counter-ions and inhibitors.



-conclusion-

hfq play an important role in tolerance to multiple biomass pretreatment inhibitors.






Monday, June 28, 2010

makefile 무엇일까

리눅스에서 수동으로 소스코드들을 컴파일 할때 Make를 많이 쓴다. 그럼 Make명령어를 위한 Makefile은 과연 무엇일까? 다음 아래 링크에 친절하게 설명되어 있다.

TPC/IP socket programming

C 언어를 급하게 마치고 저자 윤성우님의 팬이 되어서 윤성우님의 TCP/IP 프로그래밍을 공부하기로 마음 먹었다. 이제 TCP/IP 프로그래밍 시작!


Whole-Genome Sequencing Breaks the Cost Barrier

Someday I talked with my superior about direction of NGS and its affection to daily life on lunch break. I just told what I have previously from conversation with Keum(http://goldbio.blogspot.com). Maybe because of this, today morning suddenly my superior gave me a paper which is about trend of WGS. Anyhow I will summarize this one in this writing. The paper was published on June 11, 2010 in Cell .


Whole-Genome Sequencing Breaks the Cost Barrier

Laura Bonetta*

*Laura Bonetta is a freelance writer. she has written news and feature article in many top journals (http://www.linkedin.com/in/laurabonetta).

hmm.. It was originally planed to summarize this article.. but I think there is nothing new.
anyway..
the author introduced some research which is related to elucidation on causes of mendelian disease. Richard Gibbs of the Baylor College of Mdeicine in Texas sequenced the whole genome of his colleague who is suffering Charcot-Marie-Tooth disease (http://en.wikipedia.org/wiki/Charcot-Marie-Tooth_disease). He narrowed down the candidate genetic variations and confirmed that variations through identification from his colleague's siblings.
And the other research is exome sequencing of 500 genes that predispose to several childhood diseases by Leroy Hood and David Galas at the Institute for systems Biology in Seatttle. It will be one of the first gene sequencing-based tests to come on the market. Galas are preparing to sequence 30 individuals over 4 generation to answer the question "What does the mutation rate depend on?".
last one is researching on various aspect (transcriptome, epigenome, genome data) of identical twins discordant for multiple sclerosis. Although the researcher couldn't find any significant result from the their work, I think It's helpful to see article for reference how they approached on mixing multi-dimensional data.
And the author pointed out the obstacle of present sequencing tech. I will omit this part. this is pretty obvious tale.
About complex Disease.. the author introduce two points of view. From failure of GWAS, someone believe(Richard Lifton at Yale Univ.) individually rare variants with relatively large effects will play a substantial role on complex diseases or traits, while Kari Stefansson, president of the Icelandic company DeCode Genomics, think rare confluence of variants rather than just individual rare variants giving large effects. Well.. I think any opinion is not always right. That's biology and that's the reason for taking much time to solve biological system as you know.

It is sure that already the most research are using whole genome sequencing data or sort of that. The price of whole genome sequencing is gonna plummet, and the sequencing data will overflow. Preparing is needed to survive (Actually I don't like use the expression 'survive', because that sounds against one's will).


Monday, June 21, 2010

C 언어

다음은 초 단기로 c 언어를 공부하고 정말 기억해야 하는 것만 추스린 것이다. 이는 사실 velvet를 분해해보고 싶은 단순한 호기심에 의한 것이다(velvet이 c 로 되어 있다). 다음은 "C 프로그래밍 윤상우 저 (주)프리렉"을 참고로 한다.

함수내 변수 사용시 반드시 변수 선언이 우선되어야 한다.

음수를 표현할때 2의 보수 체계를 기억해야 한다.


정수형과 실수형은 메모리의 비트를 숫자화하는 방식이 다르다.

정수형은 int를 실수형은 double를 사용하길 권장한다.

const 를 이용한 심볼릭 상수 선언시 초기화를 동시에 해주어야 한다. 이는 상수라는 속성상 변화할수 없기때문에 선언시 값을 할당해주어야 하는것이다.

함수는 호출되기 전에 정의되거나 선언되어야 한다.

문자열 변수를 서식문자 %d로 받으면 주소가 %s로 받으면 문자열이 된다. 사실 포인터가 가리키는 value를 지칭하기 위해서는 *을 사용하는데 그렇기 때문에 char * str = "good" 이라고 정의했을때 "good"을 보기위해서는 마치 *str을 불러야 할거 같지만 정작 서식문자만 %s로 해준다면 str를 사용하는 것이 맞다.

배열의 이름은 포인터이지만 변할수 없는 상수 포인터이다.

포인터는 데이터 타입에 상관없이 4바이트를 갖는다.

int a = 3, int *pa = &a 라고 하고 함수 func()이 있다고 할때 func(&a)의 경우 call-by-reference가 되지만 func(pa)는 call-by-value
가 되며 두 경우 모두 func의 입장에서 받는 변수는 int * temp (포인터) 의 꼴이 되어야 한다.

다차원 배열, 예를 들어 이차원 배열, int arr[2][4] 가 있을 때 이를 가르키는 포인터는 int (*pArr)[4] 와 같이 포인터가 가르키는 변수의 데이터 타입 뿐만 아니라 포인터 연산시 증감 되는 바이트의 갯수도 명시해주어야 한다.

배열 arr이 있을때, arr[i] == *(arr+i) 이나 이것이 주소인지 값인지는 상황에 따라 달라진다. arr 이 1차 배열일 경우에는 값이지만 arr이 다차배열일 경우에는 arr[i] 역시 주소를 값으로 갖으므로 이는 주소를 뜻한다.

함수를 구동하기 위해서는 함수를 RAM에 올리게 되고 함수 이름은 함수의 위치를 나타내는 포인터가 된다. 포인터라함은 변수 이므로 이를 가르키는 함수 포인터를 만들 수 있으면 그 타입 선언은다음과 같다. 예를 들어 void func(char * str) 이라는 함수가 있을때 이를 가르키는 함수 포인터 정의는 void (*fPtr) (char*) = func 와 같다.

문자열 입력함수로 gets와 fgets 가 있지만 gets는 할당된 배열의 크기보다 큰 길이의 문자열이 입력 되었을때 overflow를 일으키므로 fgets 을 사용하는 것이 좋다.

필요한 함수를 찾아보고 싶은땐 C reference를 참고하자.

구조체란 하나 이상의 변수를 그룹 지어서 '사용자 정의 자료형'을 정의하는것이다. 구조체 멤버에 문자열 배열이 존재할 시에 그 변수에 문자열을 초기화 할시에는 strcpy함수를 이용해야 한다. 예를 들어 구조체 변수 person이 char name[10]을 멤버로 갖고 있을때 person.name = "김세환" 식으로 초기화를 하면 에러가 생기는데 이는 앞에서 언급한 내용(배열 이름은 상수 포인터이다)에 위배되는 문법이기 때문이다.

*연산자는 . 연산자에 비해 우선순위가 낮기 때문에 구조체 변수를 가르키는 포인터를 이용하기 위하선 (*pMan).name 식으로 괄호를 이용해야 한다. 혹은 pMan -> name 도 가능하다.

typeof 키워드를 이용하여 기본자료형에 사용자가 원하는 이름을 붙일 수도 있으며 이를 이용하여 구조체에 새로운 이름을 정의하여 구조체 변수 선언시 struct 를 넣지 않아도 되도록 할수 있다.

메모리(RAM)은 크게 데이터영역, 힙 영역, 스택 영역으로 나뉘며, 데이터 영역은 전역변수, static 변수가 스택영역은 지역변수, 매개 변수가 힙영역은 프로그래머가 할당한 변수가 자리잡게 된다. 변수의 특징에서 알수 있듯이 데이터 영역은 프로그램 시작과 동시에 변수가 메모리 할당되며 프로그램이 끝나야 메모리에서 지워진다. 반면에 스택영역의 변수는 함수가 호출될때 메모리를 할당되며 해당 함수가 끝나면 메모리에서 지워지며 힙영역의 변수는 프로그래머가 malloc(memory allocation의 약자) 를 통해 메모리를 할당하고 free 함수를 통해 메모리에서 지우게 된다.

스택 영역과 데이터 영역에 할당될 메모리의 크기는 컴파일되는 동안에 결정되어야 한다. 이것이 배열의 크기를 정할때 반드시 상수를 이용해야 하는 이유이다.

malloc에 의해 힙영역에 메모리를 잡을때 메모리가 부족하면 NULL을 리턴한다. 즉, 항상 malloc으로 힙 영역에 메모리를 지정하면 메모리가 여유가 있지 않을수 있음을 확인해야 한다. 또한 힙 영역에 저장된 변수는 반드시 포인터를 통해서만 접근 가능하다.

컴파일을 더 세분화시키면 전처리(preprocess)와 컴파일로 나뉜다. #로 시작하는 문장은 전처리기 지시자라고 하며 #define은 단순 치환 작업을 요청할 때 사용되는 지시자이다. 예를 들어 #define PI 3.1415라고 하면 소스에서 PI라고 되어있는 것을 3.1415로 전처리 과정에서 치환하라는 뜻이며 PI를 매크로 3.1415를 대체 리스트라고 하며 #define 지시자를 통해 매크로 함수도 선언 할수 있다.

#if, #elif, #else, #endif 는 전처리기에게 조건을 제시한다. 곧 컴파일러에 의해 컴파일되기 전에 전처리기가 위의 조건문을 보고 코드에 소스를 넣는다.
#ifndef, #endif는 헤어파일이 중복 포함되어 있음을 방지 하고자 하는것으로 #ifndef (if not defined) 는 #define이 전처리 지시자가 따라야 한다. 곧 #define에 의해 특정 변수가 정의되어 있는가로 #ifndef로 확인하고 정의되어 있지 않으면 #ifndef, #endif  안에 구문을 넣게된다. #if defined() || defined() 등에 여러개를 확인하기 위해서는 if defined를 사용해야 한다.

Sunday, June 20, 2010

Genomics Research using NGS technology

date & time : June 17 2010
location : yonsei severance

- Identification of disease-causing mutations by high throughput Targeted Resequencing [Luke Danneberg(Roche NimbleGen)]

Although recently sequencing technology achieve incredible development, sequencing of a higher organism such as human demand a considerable sum of money. On the top of this, till now, because of lack of knowledge, the clue in non-coding DNA sequence could not be helpful for discover the disease-causing mutations. The speaker in this talk introduced two sequence capture technology for solving these problem. One of the tech is exon capturing and the another is supervised targeted capturing. Exon capturing literally focus on exon sequence of genome, and the other targeted capturing tech is regardless of whether the region is exon or not, just concentrate on the region in which investigators are interested. Maybe, the difference is not clear in my classification, but I accept that difference as the size of targeted capturing regions are longer than exon. Anyway the importance of this technology is financial efficiency. With same amount of money for one whole genome sequencing, dozens of patients can be interrogated. The speaker recommend using solexa for captured exons sequencing and FLX for targeted region sequencing, because sometime targeted region contain repeat sequences. I think this technology is just temporary solution or being useless in the near future. To find out the disease-causing mutation, I think unsupervised approach is needed, and as many news say (http://www.technologyreview.com/biomedicine/25481/?a=f) sequencing price is gonna be very chip.

- Studying Gene Structure, Expression and Regulation Using the Illumina HiSeq 2000 System[Gary P.Schroth(Illumina, Inc.)]
This talk focused on performance of HiSeq 2000 which was developed in Illumina, as we can surmise from speaker's affiliation. The big difference of HiSeq with GA is that there are two flow cell in device. This result in production of larger amount of reads. As well as this quantitative advance, a change in scanning image of spot make improvement in efficiency, HiSeq produce 200Gb per run and 25Gb per day. The speaker also introduced methodology which they developed for selecting mRNA from RNA's pool. In general, the amount of mRNA is too small to interrogate without selecting mRNA. Usually poly-A selection is used for selecting mRNA, but this lead to degradation of mRNA. Normalization using DSN(duplex-specific nuclease), which utilize the fact that more amount RNA can make duplex easily during annealing from denaturation. This method affect little on mRNA quantity and have additional effect that capture non-coding RNA like lincRNA, smRNA.

- Transcriptome analysis for identification Fusion Genes[Sang-hyuk,Lee(KOBIC)]
In this talk, the speaker introduced fusion gene database (http://ercsb.ewha.ac.kr:8080/FusionGene/index.jsp) which use EST(from UniGene) and NGS(from SRA) data. A phenomenon of fusion of gene was found in some case of cancer patient, and this leaded researcher's attempt to discover the other cases. The speaker also retry to build database in the same context(according to speaker, he already prepare this kind of website 5 years ago).

- Metagenome analysis[Jong-sik Chun(Seoul National University)]
This is the most interesting talk. Through this talk I realize that microorganism area is so huge and has enormous potential. In brief, metagenome analysis is genome sequencing of mixed microbes in specific location. According to speaker, only about 6000 species of microbes are revealed while insects are found more than tens of thousands, because a lots of microbes cannot be cultured in artificial condition. The speaker anticipate that the number of microbes should be more than insects and he believe that investigation of the role of extraordinary microbe have great effect. To illustrate, one article which in published in nature compared the microbes in overweight patient's stomach with normal's. They found that the ratio of microbes in the obesity are significantly different with the normal. And the Craig Venter, the celebrity in biological field, who are introduced by speaker as freak (The speaker said that when the sequencing price is not that cheap, Venter was the first who sequenced the microbes living in sea water.) have been focusing on microbe and finally he made it that creation of microbe as if he was being God in recent time. Anyhow, many country invest a lot in metagenome field. Because the most novel microbe cannot be cloned and besides, maybe group of microbes living together can be considered as one organism, that means each of microbe has his own role in the domain, usually investigator don't try to assembly the genome, they just align each read to known organisms. Aligning reads using BLAST mean need endless time for searching every sequence, the speaker emphasize classification of microbe and he indeed made the website for classification of microbe and boast of his achievement(http://eztaxon-e.org/).