International Sheep Genomics Consortium

What's New

International Sheep Genomics Consortium

Announcing Jan 2025: 3522 genomes have been included in Run3 of the 1000 genomes project

The metadata for all 3522 animals used in run3 are available on Figshare and features SGD ID, BioSample ID, SRA accession.

Animals were aligned against the Rambouillet_v3 reference genome and variants detected using GATK4 best practice guidelines as implemented in the nf-core sarek pipeline

stats2

Completed ARS-UI_Ramb_v2.0 genome assembly and annotation

ARS-UI_Ramb_v2.0 .assembly and annotation has now been formally published in Feb 2022. Davenport et al. An improved ovine reference genome assembly to facilitate in-depth functional annotation of the sheep genome GigaScience, Volume 11, 2022, giab096 link

Details of this annotation, including statistics on the annotation products, the input data used in the pipeline and intermediate alignment results, can also be found in Annotation Release 104 of ARS-UI_Ramb_v2.0 link

Background to ISGC

The International Sheep Genomics Consortium (ISGC) is a partnership of scientists and funding agencies from Australia, Austria, Brazil, China, Finland, France, Germany, Greece, India, Iran, Israel, Italy, Kenya, New Zealand, Norway, Saudi Arabia, Spain, Switzerland, Turkey, United Kingdom and United States to develop public genomic resources that will help researchers find genes associated with production, quality and disease traits in sheep.

The project commenced informally in 2002 with the creation of a high quality ovine BAC library, and was built on an existing collaboration for the International Mapping Flock that was created nearly a decade earlier.

This work has continued and is most well known for the initial sequencing of the sheep genome and the creation of several SNP chip arrays: specifically the publicly available Illumina 50K and the Illumina 15K SNP chips. The ISGC was also involved in the creation of the Illumina HD 600K chip which is available upon request (see contacts). However, its major ongoing function has been sequencing and annotation of the sheep genome. This includes projects such as FAANG and the SheepGenomesDB commonly called the 1000 genomes sheep project.

Sheep Genome Assemblies

Please be aware that the various sheep genome assemblies are labelled differently in the different repositories. This has significant implications when identifying SNPs and other features in published papers. The initial assembly Oar_v1.0 was used to build the 50K chip and is still available at UCSC labelled as ISGC Ovis_aries_1.0. However, the four assemblies listed below are those that most published work has utilised.

Oar_v3.1 This version was published in Science in 2014 from the sequence of a Texel animal. The fully annotated genome is available via ENSEMBL, NCBI and UCSC.

Oar_v4.0 In 2015 the ISGC released Oar_v4.0 whereby long read technology (PacBio RSII) was utilised to improve the Oar_v3.1 assembly.

Assembly method: SOAPdenovo v. 1.03; PBJelly2 v. 14.9.9

Genome coverage: 166.0x

Sequencing technology: Illumina GAII; 454; PacBio RSII

Oar_rambouillet_v1.0 In 2017 Baylor College of Medicine Human Genome Sequencing Center released a genome assembly from the Ramboullet breed. The genome assembly utilised a combination of Illumina short reads and PacBio RSII long reads.

Assembly method: celera v. 8.2; Phase PGA v. 1.0; PBJelly2 v. 14.9.9; Pilon v. 1.8; Chromosomer v. 0.1.4

Genome coverage: 126.0x

Sequencing technology: HiSeq X Ten; PacBio RS II

Annotation: Salavati M, Caulton A, Clark R, Gazova I, Smith TP, Worley KC, Cockett NE, Archibald AL, Clarke SM, Murdoch BM, Clark EL. Global analysis of transcription start sites in the new ovine reference genome (Oar rambouillet v1. 0). Frontiers in genetics. 2020 Oct 23;11:580580

ARS-UI_Ramb_v2.0 This is an improved genome assembly for OAR_USU_Benz2616 submitted by University of Idaho. Davenport KM, Bickhart DM, Worley K, Murali SC, Salavati M, Clark EL, Cockett NE, Heaton MP, Smith TP, Murdoch BM, Rosen BD. An improved ovine reference genome assembly to facilitate in-depth functional annotation of the sheep genome. GigaScience. 2022 Feb 4;11.

In addition, annotation of Rambouillet (OAR_USU_Benz2616) genome is underway via The Ovine FAANG project, led by Brenda Murdoch University of Idaho and is supported by the National Institute of Food and Agriculture, U.S. Department of Agriculture, award number USDA-NIFA-2017-67016-26301.

Variant detection for user-supplied genome sequences

Results from Run2

SNPs were detected against Oar_v3.1 for 935 animals. They were the overlap of 2 pipelines (GATK & samtools).

An incomplete set of VCF files are available in an EVA study browser. Run2 results are not currently available via the EVA variant browser.

A complete set of VCF files is available from CSIRO:

Filtered

Unfiltered

All samples have BioSample IDs. However VCF files contain currently only SGD IDs. A translation table is available at Figshare and features SGD ID, BioSample ID, SRA accession.

Results from Run3

The VCF files along with the metadata are available at Genomics Aotearoa Genomic Data Repository

ISGC SNP chip array genome positions

The SNPs on the consortium arrays (Illumina 15k, 50k and HD chips) have been mapped to ARS-UI_Ramb_v2.0. Probe sequences were taken from the Illumina manifests and mapped onto the Rambouillet genome (GCA_016772045.1) using bwa mem v0.7.17-r1188 with default settings (Indels were ignored). For each SNP a probe pair was constructed by using AlleleA_ProbeSeq and appending either the reference or the alternative allele. Only probe pairs were accepted that passed following filters.

both probes are mapped

one probe is mapped with 0 mismatches

both probes are not multi-mapped

no indels were allowed

both probes had to map in the same orientation

both probes had to map to the same position

mapped probes had between them exactly 1 mismatch, the SNP.

The arrays were in addition mapped to Oar_v3.1, Oar_v4 and Oar_rambouillet_v1.0 to enable comparison of mapping approach to NCBI and Ensembl. SNP name, position and allele from the consortium arrays available on Figshare