Monday, September 6, 2010

high-throughput sequencing data submission to NCBI (GEO, SRA)

The most papers upload their data into GEO or SRA. Therefore, understanding of format which is supported in those databases is needed. Here are links for the format.

soft file format :
http://www.ncbi.nlm.nih.gov/geo/info/soft-seq.html

submitting sequencing data :
http://www.ncbi.nlm.nih.gov/geo/info/seq.html


Why certain NGS data are in SRA database, while some are in GEO :  Whole genome sequencing, metagenome, survey sequencing data and original short read format sequence files belong to SRA database.


SOFT (Simple Omnibus Format in Text) file format is just instruction about submission of data. Actual real data (fastq) can be contained or not.

Tip for checking of inclusion of
1.raw data : if SOFT file contain raw data, there should be "!Sample_raw_file...".
2. processed data : "!Sample_supplementary_file...".


No comments:

Post a Comment