Documentation
Summary
- EMDB data model
- EMDB header data model
- EMDB segmentation data model
- Policies
- Search engine
- Chart builder
- FAQ
EMDB map data model
The EM Data Bank (EMDB) accepts and distributes 3D map volumes derived from several types of EM reconstruction methods, including single particle averaging, helical averaging, 2D crystallography, and tomography. Since its inception in 2002, the EMDB map distribution format has followed CCP4 definition (CCP4 map format) , which is widely recognized by software packages used by the structural biology community. CCP4 map format is closely related to the MRC map format used in the 3DEM community (MRC map format); CCP4 is slightly more restrictive, in that voxel positions are limited to a grid that includes the Cartesian coordinate origin (0,0,0). Further details can be found here.
EMDB header data model
Every EMDB entry has a header file containing meta data (e.g., sample, detector, microscope, image processing) describing the experiment. The header file is an XML file and the structure and content of the header file is described by a XSD data model. With a highly dynamic field such as cryo-EM there is a constant need to adapt and modify the schema to keep it up-to-date with the most recent developments. We consult extensively with the EM community regarding such issues and version the schema according to the policy described here.
Data model version 1.9
This has been a long-term stable version of the data model. It was be replaced in 2018 with an updated model but XML header files in version 1.9 continues to be distributed in parallel for at least one year to give EMDB users ample time to switch. It should be noted that the generation of the version 1.9 header files will be on a best effort basis but involves a back translation from recent versions that are richer in content and will therefore not contain all the information that can be found in the more recent versions.
Download schema
Browse schema documentation
Download Python code to facilitate reading and writing XML version 1.9 header files
Data model version 3.0 (current model)
This data model replaced version 1.9, however header files corresponding to both data models will be distributed in parallel with the view of stopping the distribution of the version 1.9 files in 2019 once users have had a chance to adopt version 3.0.
This version adds a number of features including:
- An improved description of direct electron detectors, specimen preparation and tomography experiments.
- A hierarchal description of the overall sample composition in combination with a low-level description of the macromolecular composition to allow the description of both molecular and cellular samples.
- Specific data items describing the half-maps and segmentations included with the entry.
Download schema
Browse schema documentation
Download Python code to facilitate reading and writing XML version 1.9 header files
EMDB segmentation data model
Segmentation is the decomposition of 3D volumes into regions that can be associated with defined objects. Following several consultations with the EM community (Patwardhan et al., 2012; Patwardhan et al., 2014; Patwardhan et al., 2017), the EMDB is in the process of developing tools to support deposition of volume segmentations with structured biological annotation which is here defined as the association of data with identifiers (e.g., accession codes from UniProt) and ontologies taken from well established bioinformatics resources. To our knowledge, none of the segmentation formats widely used in electron microscopy and related fields currently support structured biological annotation. Third party use of segmentations is further impeded by the prevalence of segmentation file formats and their lack of interoperability. EMDB therefore proposed an open segmentation file format called EMDB-SFF to capture basic segmentation data from application-specific segmentation file formats and provide the means for structured biological annotation. In this way, EMDB-SFF will not only enable depositions of segmentations but also act as a file interchange format between different applications and facilitate analysis of 3D reconstructions. Furthermore EMDB-SFF supports the description of multiple transforms for a segment, thus allowing a segment to be used to describe the placement of a sub-tomogram average onto a tomographic reconstruction.
Model
EMDB-SFF files have the follow features:
- Segmentation metadata:
- name
- version (of schema)
- details (free-form text)
- global external references, e.g. specimen scientific identifier
- bounding box
- primary descriptor contained i.e. one of ‘three_d_volume’, ‘mesh_list’, or ‘shape_primitive_list’ (see schema documentation)
- list of software used to create the segmentation (name, version, processing details)
- list of transforms referenced by segments e.g. transform to place the sub-tomogram average in the tomogram
- Hierarchical ordering of segments through the use of segment IDs and parent IDs;
- Four geometrical representations of segments (volumes, contours, meshes, shapes);
- Can store subtomogram averages and how they map into the parent tomogram through the use of transforms;
- List of associated external references per segment;
- List of associated complexes and macromolecules in a related EMDB entry
Each segment in a segmentation can consist of two types of descriptors:
- textual descriptors;
- geometric descriptors.
Textual descriptors consist of either free-form text or standardised terms. Standard terms should be provided from a [published] ontology or list of identifiers.
Geometric descriptors can take one or more of the following representations:
- ‘three_d_volume’ for 3D volumes;
- ‘mesh_list’ for lists of meshes each of which consists of a set of vertices and polygons;
- lists of shape primitives (ellipsoid, cuboid, cone, cylinder).
Documentation
Download
The current schema (version 0.8.0.dev1) is available here.
Documentation
Complete documentation of the schema is available here.
Auxiliary Tools
sfftk-rw
sfftk-rw is a Python toolkit for reading and writing EMDB-SFF files only. It is part of a family of tools designed to work with EMDB-SFF files.
sfftk-rw has the following utilities:
- convert - interconvert between XML, HDF5 and JSON file formats of the EMDB-SFF data model;
- view - view a file summary
The full documentation is available at readthedocs.
Download
The latest version runs only on Python 3 (version 0.7.1) and may be installed using pip install sfftk-rw
. Alternatively, feel free to obtain the source code from Github.
sfftk
sfftk provides a shell command and a Python API to process EMDB-SFF files.
The following utilities are available using sfftk:
- convert - Conversion of application-specific segmentation file formats to EMDB-SFF. Currently, sfftk supports the following formats:
- AmiraMesh (.am)
- Amira HyperSurface (.surf)
- Segger (.seg)
- EMDB Map masks (.map)
- Stereolithography (.stl)
- IMOD (.mod)
- notes - Annotation of EMDB-SFF files.
- view - Brief summaries of segmentation files.
Read the full documentation here.
Download
The latest development version (version 0.5.5.dev1) of sfftk may be downloaded/installed from PyPI or the source may be obtained from GitHub.
Publications
- Patwardhan, Ardan, Robert Brandt, Sarah J. Butcher, Lucy Collinson, David Gault, Kay Grünewald, Corey Hecksel et al. Building bridges between cellular and molecular structural biology. eLife 6 (2017).
- Patwardhan, Ardan, Alun Ashton, Robert Brandt, Sarah Butcher, Raffaella Carzaniga, Wah Chiu, Lucy Collinson et al. A 3D cellular context for the macromolecular world. Nature structural & molecular biology 21, no. 10 (2014): 841-845.
- Patwardhan, Ardan, José-Maria Carazo, Bridget Carragher, Richard Henderson, J. Bernard Heymann, Emma Hill, Grant J. Jensen et al. Data management challenges in three-dimensional EM. Nature structural & molecular biology 19, no. 12 (2012): 1203-1207.
Quick links
Recent Entries
(Show all)Pseudorabies virus cytosolic C-capsid (US3 KO) vertices determined in situ
focused map of complex I peripheral arm from C respirasome, murine liver
Full-length human cystathionine beta-synthase, basal state, partially degraded tetramer
Local refinement map of CaSR extracellular domain in nanodisc-reconstituted human CaSR-Gi3 complex
Tertiary structure of an individual particle of self-folding RNA polymer (particle #027)
Structure of the recycling U5 snRNP bound to chaperones CD2BP2 and TSSC4 (State 1, Map 1)
Structure of the recycling U5 snRNP bound to chaperone CD2BP2 (State 3, Map 3)
Cryo-EM structures of the head region of full-length ERGIC-53 with MCFD2 (Substate A)
Local refinement map of CaSR transmembrane domain (TMD) in detergent-solubilized human CaSR-miniGisq complex
Structure of the Mumps Virus L Protein Bound by Phosphoprotein Tetramer (composite map)
Structure of CUL9-RBX1 ubiquitin E3 ligase complex in unneddylated and neddylated conformation - focused cullin dimer
Structure of the hexameric CUL9-RBX1 complex with deletion of CUL9 DOC domain
Map of YPEL5-bound WDR26 dimer obtained by focused refinement of the WDR26-CTLH subcomplex
Full-length human cystathionine beta-synthase, basal state, helical reconstruction
Tertiary structure of an individual particle of self-folding RNA polymer (particle #028)
Pseudorabies virus cytosolic C-capsid (WT) vertices determined in situ
S. thermodepolymerans KpsMT-KpsE with bound glycolipid - state 1 - KpsM focused map
Knockout of GMC-oxidoreductase genes reveals that functional redundancy preserves mimivirus essential functions
Triple tandem trimer immunogens for HIV-1 and influenza nucleic acid-based vaccines (H2/1 GCN4)
Structure of CUL9-RBX1 ubiquitin E3 ligase complex - hexameric assembly
In situ structure of the Nitrosopumilus maritimus S-layer - Composite map between C2 and C6
Local refinement map of CaSR extracellular domain in nanodisc-reconstituted human CaSR-miniGisq complex
Tertiary structure of an individual particle of self-folding RNA polymer (particle #025)
Tertiary structure of an individual particle of self-folding RNA polymer (particle #017)
Structure of the hexameric CUL9-RBX1 complex with deletion of CUL9 ARM9 domain
Pseudorabies nuclear C-capsids (US3 KO) vertices determined in situ
Structure of CUL9-RBX1 ubiquitin E3 ligase complex in unneddylated and neddylated conformation - focused dimeric core
Umb1 umbrella toxin particle (local refinement of UmbB1 bound ALF of UmbC1 and UmbA1)
Synechocystis PCC 6803 Phycobilisome quenched by OCP, high resolution
Consensus map of HSV-1 DNA polymerase-processivity factor complex in exonuclease state
Tertiary structure of an individual particle of self-folding RNA polymer (particle #012)
CryoEM structure of human PI3K-alpha (P85/P110-H1047R) with QR-8557 binding at an allosteric site
Local refinement map of CaSR transmembrane domain in nanodisc-reconstituted human CaSR-miniGis complex
Structure of human calcium-sensing receptor in complex with chimeric Gs (miniGis) protein in nanodiscs
S. thermodepolymerans KpsMT-KpsE with bound glycolipid - state 2 - KpsT focused map
S. thermodepolymerans KpsMT(E151Q)-KpsE in complex with ATP - consensus map
RNA polymerase II early elongation complex bound to TFIIE and TFIIF - state a (composite structure)
Local refinement map of G protein in detergent-solubilized human CaSR-miniGi1 complex
Pseudorabies virus nuclear C-capsid (WT) vertices determined in situ
Subtomogram average of pseudorabies virus nuclear egress complex (UL31/34) determined in situ
Cryo-EM structure of the head region of full-length ERGIC-53 with MCFD2 (Substate C)
Structure of the transcription termination factor Rho bound to RNA at the PBS and SBS
Tertiary structure of an individual particle of self-folding RNA polymer (particle #020)
S. thermodepolymerans KpsMT(E151Q)-KpsE in complex with ATP - crown focused map
Central rod disk in C1 symmetry of high-resolution phycobilisome quenched by OCP (local refinement)
focused map of complex I peripheral arm from SC respirasome, murine liver
In situ structure of the Nitrosopumilus maritimus S-layer - Six-fold symmetry (C6)
Structure of the recycling U5 snRNP bound to chaperone CD2BP2 and TSSC4 (Map 5)
Human TWIK-related acid-sensitive potassium channel TASK3 at pH 7.4,200 mM KCl
Structure of CUL9-RBX1 ubiquitin E3 ligase complex in unneddylated and neddylated conformation - focused on E2-like density
Pseudorabies virus primary enveloped (perinuclear) C-capsid (US3 KO) vertices determined in situ
Local refinement map of G protein in detergent-solubilized human CaSR-miniGisq complex
Triple tandem trimer immunogens for HIV-1 and influenza nucleic acid-based vaccines (cH125 TTT)
Tertiary structure of an individual particle of self-folding RNA polymer (particle #024)
RNA polymerase II core initially transcribing complex with an ordered RNA of 8 nt
Tertiary structure of an individual particle of self-folding RNA polymer (particle #013)
Cryo-EM structure of the GI.4 Chiba VLP complexed with the CV-1A1 Fv-clasp
Tertiary structure of an individual particle of self-folding RNA polymer (particle #011)
Structure of human calcium-sensing receptor in complex with Gi3 protein in nanodiscs
S. thermodepolymerans KpsM-KpsE in Apo 2 state with rigid body fitted KpsT
Structure of the recycling U5 snRNP bound to chaperones CD2BP2 and TSSC4 (State 2, Map 2)
S. thermodepolymerans KpsM-KpsE in Glycolipid 2 state with rigid body fitted KpsT
Tertiary structure of an individual particle of self-folding RNA polymer (particle #016)
Triple tandem trimer immunogens for HIV-1 and influenza nucleic acid-based vaccines
P-glycoprotein in complex with UIC2 Fab and triple elacridar molecules in nanodisc
P-glycoprotein in complex with UIC2 Fab and triple elacridar molecules in LMNG detergent
CryoEM structure of Nal1 protein, allele SPIKE, from Oryza sativa japonica group
Activated CRAF/MEK heterotetramer from focused refinement of CRAF/MEK/14-3-3 complex
Structure of human calcium-sensing receptor in complex with chimeric Gq (miniGisq) protein in detergent
Structure of the recycling U5 snRNP bound to chaperone CD2BP2 (State 4, Map 4)
S. thermodepolymerans KpsMT-KpsE with bound glycolipid - state 1 - KpsT focused map
Cryo-EM structure of human Pannexin-3 R36S/F40R variant in pre-open state
Focused refinement of the 40S subunit head of the 80S Giardia lamblia ribosome at 2.94 angstroms resolution.
TRPM7 structure in complex with anticancer agent CCT128930 in closed state
Human TWIK-related acid-sensitive potassium channel TASK3 at pH 7.4, 5 mM KCl and 135 mM NaCl
Tertiary structure of an individual particle of self-folding RNA polymer (particle #021)
Structure of the hexameric CUL9-RBX1 complex with deletion of CUL9 CPH domain
Herpes simplex virus 1 nuclear C-capsid (WT) vertices determined in situ
CryoEM structure of activated CRAF/MEK/14-3-3 complex with NST-628
Cryo-EM structure of nanodisc (PE:PS:PC) reconstituted GLIC at pH 4 in open state
RNA polymerase II early elongation complex bound to TFIIE and TFIIF - state b (composite structure)
Structure of DDM1-nucleosome complex in the ADP-BeFx state with DDM1 bound to SHL2 and SHL-2
Local refinement map of CaSR transmembrane domain (TMD) in detergent-solubilized human CaSR-miniGi1 complex
Triple tandem trimer immunogens for HIV-1 and influenza nucleic acid-based vaccines (H5/1 GCN4)
Tertiary structure of an individual particle of self-folding RNA polymer (particle #026)
S. thermodepolymerans KpsMT-KpsE in Apo 2 state - KpsT focused map
Knockout of GMC-oxidoreductase genes reveals that functional redundancy preserves mimivirus essential functions
Structure of the transcription termination factor Rho in complex with Rof
Structure of the hexameric CUL9-RBX1 complex with deletion of CUL9 ARIH-RBR element
Structure of human calcium-sensing receptor in complex with chimeric Gq (miniGisq) protein in nanodiscs
focused map of murine liver complex I peripheral arm in the closed conformation
focused map of murine liver complex I membrane arm in the closed conformation
global refinement of murine liver complex I in the open conformation
focused map of the peripheral arm of complex I from murine liver in the open conformation
focused map of the membrane arm of complex I from murine liver in the open conformation
global refinement of murine brain complex I in the closed conformation
focused map of the peripheral arm of complex I from murine brain in the closed conformation
focused map of the membrane arm of murine brain complex I in the closed conformation
global refinement of complex I from murine brain in the open conformation
focused map of the peripheral arm of murine brain complex I in the open conformation
focused map of the membrane arm of murine brain complex I in the open conformation
Local refinement map of CaSR extracellular domain (ECD) in detergent-solubilized human CaSR-miniGi1 complex
Full-length human cystathionine beta-synthase with C-terminal 6xHis-tag, SAM bound, activated state, local single particle reconstruction
CryoEM structure of nucleotide-free form of the nitrogenase iron protein from A. vinelandii
S. thermodepolymerans KpsMT-KpsE with bound glycolipid - state 2 - consensus map
HSV-1 DNA polymerase-processivity factor complex in halted elongation state consensus map
CryoEM structure of human PI3K-alpha (P85/P110-H1047R) with QR-7909 binding at an allosteric site
Central rod disk in D3 symmetry of high-resolution phycobilisome quenched by OCP (local refinement)
Cryo-EM structures of the head region of full-length ERGIC-53 with MCFD2 (form B)
Local refinement map of CaSR transmembrane domain in nanodisc-reconstituted human CaSR-miniGisq complex
Cryo-EM structure of Pyrococcus furiosus transcription elongation complex
Consensus map of HSV-1 DNA polymerase-processivity factor complex in pre-translocation state
Open-state cryo-EM structure of human TRPV3 in presence of 2-APB in cNW30 nanodiscs
Tertiary structure of an individual particle of self-folding RNA polymer (particle #014)
Tertiary structure of an individual particle of self-folding RNA polymer (particle #022)
CryoEM structure of Nal1 protein, allele IR64, from Oryza sativa indica cultivar
Cryo-EM structure of human Elp123 in complex with tRNA, S-ethyl-CoA, 5'-deoxyadenosine and methionine
Local refinement map of CaSR extracellular domain (ECD) in detergent-solubilized human CaSR-miniGisq complex
focused map of complex I peripheral arm from A respirasome, murine liver
Structure of the Mumps Virus L Protein (state2) Bound by Phosphoprotein Tetramer
S. thermodepolymerans KpsMT-KpsE with bound glycolipid - state 2 - KpsM focused map
Triple tandem trimer immunogens for HIV-1 and influenza nucleic acid-based vaccines. H5 GCN4
Rod from high-resolution phycobilisome quenched by OCP (local refinement)
Cryo-EM structure of human Elp123 in complex with tRNA, desulpho-CoA, 5'-deoxyadenosine and methionine
focused map of complex I membrane arm from A respirasome, murine liver
Cryo-EM structure of nanodisc (PE:PS:PC) reconstituted GLIC at pH 2.5
Human TWIK-related acid-sensitive potassium channel TASK3 at pH 6.0, 200 mM KCl
Local refinement map of CaSR transmembrane domain in nanodisc-reconstituted human CaSR-Gi3 complex
Inactivated-state cryo-EM structure of human TRPV3 in presence of tetrahydrocannabivarin (THCV) in cNW30 nanodiscs
Inactivated-state cryo-EM structure of human TRPV3 in presence of 2-APB in cNW30 nanodiscs
Focused refinement map of BRR2 region of the human minor pre-B complex
Full-length human cystathionine beta-synthase with C-terminal 6xHis-tag, SAM bound, activated state, helical reconstruction
Cryo-EM structure of human Elp123 in complex with tRNA, acetyl-CoA, 5'-deoxyadenosine and methionine
S. thermodepolymerans KpsM-KpsE in Glycolipid 1 state with rigid body fitted KpsT
Top cylinder bound to OCP from high-resolution phycobilisome quenched by OCP (local refinement)
Cryo-EM structure of the head region of full-length ERGIC-53 with MCFD2 (Substate B)
Inner channel lipids regulated gating mechanism of human pannexins
Herpes simplex virus 1 cytosolic C-capsid (WT) vertices determined in situ
Structure of CUL9-RBX1 ubiquitin E3 ligase complex in unneddylated conformation - symmetry expanded unneddylated dimer
Structure of human calcium-sensing receptor in complex with Gi1 (miniGi1) protein in detergent
focused map of complex I membrane arm from C respirasome, murine liver
Tertiary structure of an individual particle of self-folding RNA polymer (particle #018)
Full-length human cystathionine beta-synthase with C-terminal 6xHis-tag, basal state, single particle reconstruction
Local refinement map of CaSR extracellular domain in nanodisc-reconstituted human CaSR-miniGis complex
Cryo-EM structure of the head region of full-length ERGIC-53 with MCFD2 (form A)
Cryo-EM structure of nanodisc (PE:PS:PC) reconstituted GLIC at pH 5.5
Tertiary structure of an individual particle of self-folding RNA polymer (particle #030)
Subtomogram average of pseudorabies virus nuclear egress complex helical form (UL31/34) determined in situ
Tertiary structure of an individual particle of self-folding RNA polymer (particle #023)
S. thermodepolymerans KpsMT-KpsE with bound glycolipid - state 1 - consensus map
focused map of the canonical CIV from SC respirasome, murine liver
Tertiary structure of an individual particle of self-folding RNA polymer (particle #015)
Full-length human cystathionine beta-synthase, basal state, single particle reconstruction
Local refinement map of G protein in nanodisc-reconstituted human CaSR-miniGis complex
Open-state cryo-EM structure of human TRPV3 in presence of tetrahydrocannabivarin (THCV) in cNW30 nanodiscs
Full-length human cystathionine beta-synthase with C-terminal 6xHis-tag, basal state, helical reconstruction
Cryo-EM structure of the head region of full-length ERGIC-53 with MCFD2 (Substate D)
Bottom cylinder of high-resolution phycobilisome quenched by OCP (local refinement)
80S Giardia lamblia ribosome at 2.66 angstroms resolution with Emetine in the the V1 conformation
Cryo-EM structure of nanodisc (PE:PS:PC) reconstituted GLIC at pH 4 in closed state
focused map of the SCAF1-containing complex IV from SC respirasome, murine livers
Human TWIK-related acid-sensitive potassium channel TASK3 at pH 6.0, 5 mM KCl and 135 mM NaCl
Full-length human cystathionine beta-synthase with C-terminal 6xHis-tag, SAM bound, activated state, local helical reconstruction
Structure of the transcription termination factor Rho in complex with Rof and ADP
Cryo-EM structure of human Elp123 in complex with 5'-deoxyadenosine and methionine
Tertiary structure of an individual particle of self-folding RNA polymer (particle #029)
Cryo-EM structure of nanodisc (PE:PS:PC) reconstituted GLIC at pH 4 in intermediate state
Local refinement map of G protein in nanodisc-reconstituted human CaSR-miniGisq complex
focused map of complex I membrane arm from SC respirasome, murine liver
Structure of the recycling U5 snRNP bound to chaperone CD2BP2 and TSSC4 (Map 6)
Local refinement map of G protein in nanodisc-reconstituted human CaSR-Gi3 complex
Tertiary structure of an individual particle of self-folding RNA polymer (particle #019)
RNA polymerase II core initially transcribing complex with an ordered RNA of 10 nt
EM map of de novo designed protein nanopore RNR_C6_3 using a reinforcement learning approach