TIP mmCIF vs PDB files in Discovery Studio

JS 2026-02-05

Since Discovery Studio 2026 users can both read and write .cif files in addition to PDB files.

If you are not familiar with cif files:

The CIF (Crystallographic Information Framework), specifically mmCIF (macromolecular CIF), is the modern, official standard for the Protein Data Bank (PDB), replacing the legacy PDB file format (.pdb). While the PDB format was dominant for decades, its rigid 80-column structure cannot handle the large, complex molecular structures (like ribosomes or large viruses) common in modern structural biology.

CIF supports unlimited, multi-character chain and residue names via fields like _atom_site.label_asym_id, whereas PDB restricts chain IDs to one character and residue names to three.

But there are some differences in handling molecules read in as PDB vs. CIF.

If you want to create the Biological Assemblies:

PDB

Within some .pdb files there is a definition of how to generate the biologically active unit by application of matrices to specific chains. The software automatically stores the matrices for you, but you will need to refer to the .pdb file itself for the details of which matrix to apply to which chain.

Open the .pdb file.
In the text editor of your choice, display the .pdb file and search for REMARK 300. If this does not appear in the file then there is no definition of an oligomeric biologically active unit.
Following this will be a series of REMARK 350 records. These contain the transformation matrices, preceded by a list of chains to which to apply them.
Select the first set of chains specified.
Display the Apply Transformation Matrix dialog.
From the dropdown list, select the correct matrix (for example BIOMT_1 for the first operation).
Make sure that both the Selected Atoms Only and the Create Copy controls are checked on.
Click OK to generate the new copy.
Repeat as necessary for other chains and/or matrices specified.

In this case, each subunit will appear as separate molecules.

CIF

These can be obtained directly from RCSB and are named -assembly.cif, where N is the assembly number. Therefore, if you wish to import the assembly you would use File > Open URL and enter something like 1abc-assembly1.

In this case, each subunit will appear as separate chains with chain names A-2, A-3 etc.