What's New

International Sheep Genomics Consortium

texel genome flock sheepworld chips

The International Sheep Genomics Consortium (ISGC) is a partnership of scientists and funding agencies from Australia, Austria, Brazil, China, Finland, France, Germany, Greece, India, Iran, Israel, Italy, Kenya, New Zealand, Norway, Saudi Arabia, Spain, Switzerland, Turkey, United Kingdom and United States to develop public genomic resources that will help researchers find genes associated with production, quality and disease traits in sheep.

The project commenced informally in 2002 with the creation of a high quality ovine BAC library, and was built on an existing collaboration for the International Mapping Flock that was created nearly a decade earlier.

This work has continued and is most well known for the initial sequencing of the sheep genome and the creation of several SNP chip arrays: specifically the publicly available Illumina 50K and the Illumina 15K SNP chips. The ISGC was also involved in the creation of the Illumina HD 600K chip which is available upon request (see contacts). However, its major ongoing function has been sequencing and annotation of the sheep genome. This includes projects such as FAANG and the SheepGenomesDB commonly called the 1000 genomes sheep project.

Sheep Genome Assemblies

Please be aware that the various sheep genome assemblies are labelled differently in the different repositories. This has significant implications when identifying SNPs and other features in published papers. The initial assembly Oar_v1.0 was used to build the 50K chip and is still available at UCSC labelled as ISGC Ovis_aries_1.0. However, the three assemblies listed below are those that most published work has utilised.

Oar_v3.1 This version was published in Science in 2014 from the sequence of a Texel animal. The fully annotated genome is available via ENSEMBL, NCBI and UCSC.

Oar_v4.0 In 2015 the ISGC released Oar_v4.0 whereby long read technology (PacBio RSII) was utilised to improve the Oar_v3.1 assembly.

  • Assembly method: SOAPdenovo v. 1.03; PBJelly2 v. 14.9.9
  • Genome coverage: 166.0x
  • Sequencing technology: Illumina GAII; 454; PacBio RSII
  • Oar_rambouillet_v1.0 In 2017 Baylor College of Medicine Human Genome Sequencing Center released a genome assembly from the Ramboullet breed. The genome assembly utilised a combination of Illumina short reads and PacBio RSII long reads.

  • Assembly method: celera v. 8.2; Phase PGA v. 1.0; PBJelly2 v. 14.9.9; Pilon v. 1.8; Chromosomer v. 0.1.4
  • Genome coverage: 126.0x
  • Sequencing technology: HiSeq X Ten; PacBio RS II
  • Please note: this is not the expected final version with an expected update utilising Oxford Nanopore long reads to complement the assembly expected early 2020.

    In addition, annotation of Rambouillet (OAR_USU_Benz2616) genome is underway via The Ovine FAANG project, led by Brenda Murdoch University of Idaho and is supported by the National Institute of Food and Agriculture, U.S. Department of Agriculture, award number USDA-NIFA-2017-67016-26301.

    Global statistics (NCBI) of the three Ovis aries (sheep) genome assemblies

    assembly stats

    SheepGenomesDB

    The Sheep Genomes Database is funded by the USDA AFRI to provide the sheep genomics research community with a genomes hub. It is an initiative of the International Sheep Genomics Consortium and extends the consortiums achievement on the build and release of the sheep reference genome assembly v3.1.

    The Sheep Genomes Database has the following three key objectives:

  • collect and make available sheep genome data on behalf of the community
  • provide variant detection for user supplied genome sequences
  • download SNP and CNV data from the growing collection of sheep genomes
  • Results from Run2:

  • SNPs were detected against Oar_v3.1 for 935 animals. They were the overlap of 2 pipelines (GATK & samtools).
  • An incomplete set of VCF files are available in an EVA study browser. Run2 results arenot currently available via the EVA variant browser. We are working with EVA to address these problems.
  • All samples have BioSample IDs. However VCF files contain currently only SGD IDs. A translation table is available at Figshare and features SGD ID, BioSample ID, SRA accession.
  • Summary of animals in Run 2

    genomestats

    ISGC SNP chip array genome positions

    The SNPs on the consortium arrays (Illumina 15k, 50k and HD chips) have been mapped to Oar_rambouillet_v1.0. Probe sequences were taken from the Illumina manifests and mapped onto the Rambouillet genome (GCA_002742125.1) using bwa mem v0.7.17-r1188 with default settings (Indels were ignored). For each SNP a probe pair was constructed by using AlleleA_ProbeSeq and appending either the reference or the alternative allele. Only probe pairs were accepted that passed following filters.

  • both probes are mapped
  • one probe is mapped with 0 mismatches
  • both probes are not multi-mapped
  • no indels were allowed
  • both probes had to map in the same orientation
  • both probes had to map to the same position
  • mapped probes had between them exactly 1 mismatch, the SNP.
  • The arrays were in addition mapped to Oar_v3.1 and Oar_v4 to enable comparison of mapping approach to NCBI and Ensembl. SNP name, position and allele from the consortium arrays available on Figshare