The core package#

The Glycan#

Glycans are represented by the Glycan class. The Glycan is actually just a derivative of a buildamol.Molecule with some additional methods and attributes to handle glycan-specific operations. The Glycan class is used to represent glycan molecules and is used to generate glycan molecules from IUPAC strings, graph structures, or to extend/crop glycan molecules to match a given IUPAC string. The Glycan class also has methods to draw 2D (SNFG) and 3D representations of the glycan molecule.

Making Glycans from IUPAC strings#

The IUPAC nomenclature has been widely established to describe glycans in textual form because their SMILES tend to be very long and cumbersome to work with. Glycans have a from_iupac method that can read an IUPAC string and produce a 3D model for the glycan. Alternatively one may use the top-level read_iupac function to the same effect, or pass an IUPAC string to the glycan function.

import glycosylator as gl

my_glycan = gl.read_iupac("my new glycan", "Man(b1-4)Glc")

The Glycan class also has a to_iupac method that can convert a glycan molecule to an IUPAC string.

iupac_string = my_glycan.to_iupac()

Making Glycans from other inputs#

Since the Glycan class inherits from the buildamol.Molecule it supports many more inputs such as RDKit molecules, SMILES, or PDB files. You can either use a dedicated classmethod such as from_pdb to get a desired glycan molecule, or trust that the top-level glycan function can automatically figure out what kind of input you are providing. This function is very versatile and can be provided with a variety of inputs which are automatically processed behind the scenes. It is the most convenient way for users to obtain a glycan structure.

# make a non-standard glucose from SMILES
smiles = "NCC1OC(O)C(O)C(O)C1O"
nitrogen_glucose = gl.glycan(smiles)

Modifying individual sugars#

If you do not feel like working with SMILES in order to make small chemical changes on individual sugars or even whole glycans, you can use the flexibility of BuildAMol, which Glycosylator is built upon, to help you out. Let’s say we we want to make a phospho-glucose we can do something like this:

import glycosylator as gl
import buildamol as bam

# get a glucose
glc = gl.glycan("GLC")

# now use buildamol directly to modify
# the molecule
bam.phosphorylate(glc, at_atom="C6")
glc.remove_atoms("O6", "HO6")

class glycosylator.core.Glycan.Glycan(structure, root_atom: str | int | Atom = None, model: int = 0, chain: str = None)[source]#

Bases: Molecule

A glycan molecule

Parameters:

id (str) – The id of the molecule
structure (buildamol.Structure or biopython.PDB.Structure or buildamol.Molecule) – The structure of the molecule

attach(other: Molecule, link: str | Linkage = None, at_residue: int | Residue = None, other_residue: int | Residue = None, use_patch: bool = True, inplace: bool = True, other_inplace: bool = False, _topology=None) → Glycan[source]#

Attach another structure to this one using a Patch or a Recipe.

Parameters:

other (Molecule) – The other molecule to attach to this one
link (str or Linkage) – Either a Patch to apply when attaching or a Recipe to use when stitching. If None is defined, the default patch or recipe that was set earlier on the molecule is used.
at_residue (int or Residue) – The residue to attach the other molecule to. If None, the defined attach_residue is used.
other_residue (int or Residue) – The residue in the other molecule to attach this molecule to. If None, the defined attach_residue of the other molecule is used.
use_patch (bool) – If the specified linkage is a patch (has internal coordinates) it can and is by default applied as a patch. However, it can also be used as a recipe. Set this to false if you want to use the patch as a recipe.
inplace (bool) – If True the molecule is directly modified, otherwise a copy of the molecule is returned.
other_inplace (bool) – All atoms from the other molecule are integrated into this one. Hence, the other molecule is left empty. If False, a copy of the other molecule is used. Thus leaving the original molecule intact.
_topology (Topology) – The topology to use when attaching. If None, the topology of the molecule is used. Only used if the patch is a string.

clashes_with_scaffold(clash_threshold: float = 1.0, ignore_hydrogens: bool = True, coarse_precheck: bool = True) → bool[source]#

Check if the glycan clashes with the scaffold

Parameters:

clash_threshold (float) – The minimum distance to consider a clash
ignore_hydrogens (bool) – Whether to ignore hydrogens
coarse_precheck (bool) – Whether to use a coarse pre-check to speed up the process. This may lead to false negatives, especially if the scaffold has very large residues (e.g. lipids with long carbon chains).

Returns:

Whether the glycan clashes with the scaffold

Return type:

bool

crop_to(full_iupac: str, inplace: bool = True, _topology=None)[source]#

Crop a glycan to a glycan matching the provided IUPAC/SNFG string. Note that this can only crop the glycan from the leaves-onward. It cannot retro-fit a root given some leaf glycans.

Parameters:

full_iupac (str) – The full IUPAC/SNFG string which the glycan should be cropped to.
inplace (bool) – Whether to crop the glycan in place or return a new glycan.
_topology – A particular topology to use. If None, the default topology is used.

draw2d(ax=None, axis='y', **kwargs) → Axes#

Draw the SNFG 2D schematic of the glycan

Parameters:

ax (matplotlib.Axes) – The axes to draw on
axis (str) – The orientation of the glycan y-axis = vertical, x-axis = horizontal.

Returns:

The axes

Return type:

matplotlib.Axes

extend_to(full_iupac: str, inplace: bool = True, _topology=None)[source]#

Extend a partial glycan to a glycan matching the provided IUPAC/SNFG string. Note that this can only extend the glycan from the root-onward. It cannot retro-fit a root given some leaf glycans.

Parameters:

full_iupac (str) – The full IUPAC/SNFG string which the glycan should be extended to.
inplace (bool) – Whether to extend the glycan in place or return a new glycan.
_topology – A particular topology to use. If None, the default topology is used.

find_glycosmos_ids() → list#

find_glytoucan_ids() → list[source]#

classmethod from_compound(compound: str, by: str = None, root_atom=None)[source]#

Create a Molecule from a reference compound from the PDBECompounds database

Parameters:

compound (str) – The compound to search for
by (str) – The field to search by. This can be - “id” for the PDB id - “name” for the name of the compound (must match any known synonym of the iupac name) - “formula” for the chemical formula - “smiles” for the SMILES string (also accepts InChI)
root_atom (str or int) – The id or the serial number of the root atom (optional)

classmethod from_glycosmos(id: str, _topology=None) → Glycan[source]#: Generate a glycan molecule from a GlyCosmos/GlyTouCan ID

classmethod from_glytoucan(id: str, _topology=None) → Glycan#: Generate a glycan molecule from a GlyCosmos/GlyTouCan ID

classmethod from_iupac(id: str, iupac: str, _topology=None) → Glycan[source]#: Generate a glycan molecule from an IUPAC/SNFG string

classmethod from_pdb(filename: str)[source]#

Read a Molecule from a PDB file

Parameters:

filename (str) – Path to the PDB file
root_atom (str or int) – The id or the serial number of the root atom (optional)
id (str) – The id of the Molecule. By default an id is inferred from the filename.
model (int) – The index of the model to use (default: 0)
has_atom_ids (bool) – If the PDB file provides no atom ids, set this to False in order to autolabel the atoms.

classmethod from_snfg(id: str, iupac: str, _topology=None) → Glycan#: Generate a glycan molecule from an IUPAC/SNFG string

get_glycosmos_id() → str#

Get the GlyTouCan ID of the glycan molecule

Returns:: The GlyTouCan ID (if available)
Return type:: str

get_glytoucan_id() → str[source]#

Get the GlyTouCan ID of the glycan molecule

Returns:: The GlyTouCan ID (if available)
Return type:: str

hist() → DataFrame[source]#

Get a histogram of the glycan residues

Returns:: The histogram
Return type:: pd.DataFrame

infer_glycan_tree()[source]#: Infer the glycan tree connectivity in case of a glycan molecule that was loaded externally

remove_residues(*residues: int | Residue) → list[source]#

Remove residues from the structure

Parameters:: residues (int or base_classes.Residue) – The residues to remove, either the object itself or its seqid
Returns:: The removed residues
Return type:: list

search_glycosmos_ids() → list#

Find GlyTouCan IDs for glycans that are partial matches of the glycan.

Returns:: A list of GlyTouCan IDs (if available)
Return type:: list

search_glytoucan_ids() → list[source]#

Find GlyTouCan IDs for glycans that are partial matches of the glycan.

Returns:: A list of GlyTouCan IDs (if available)
Return type:: list

snfg(ax=None, axis='y', **kwargs) → Axes[source]#

Draw the SNFG 2D schematic of the glycan

Parameters:

ax (matplotlib.Axes) – The axes to draw on
axis (str) – The orientation of the glycan y-axis = vertical, x-axis = horizontal.

Returns:

The axes

Return type:

matplotlib.Axes

to_iupac(add_terminal_conformation: bool = True) → str[source]#

Generate an IUPAC/SNFG string from the glycan molecule

Parameters:: add_terminal_conformation (bool) – Whether to add the terminal conformation of the first residue as (a1- or (b1- to the end string.

to_snfg(add_terminal_conformation: bool = True) → str#

Generate an IUPAC/SNFG string from the glycan molecule

Parameters:: add_terminal_conformation (bool) – Whether to add the terminal conformation of the first residue as (a1- or (b1- to the end string.

glycosylator.core.Glycan.glycan(g: str | list, id: str = None, _topology=None)[source]#

The toplevel function to generate an entire glycan molecule from either an IUPAC/SNFG string, a list of residues from a graph structure, or just get a single residue glycan molecule (e.g. one Glucose, etc.).

Parameters:

g (str or list) – The glycan string to parse. The string may be a single sugar residue’s name, in which case this function is used like buildamol.molecule, or can be an entire glycan structure in IUPAC/SNFG condensed format - currently, neither extended nor short formats are supported (refer to the read_iupac function for more information). Alternatively, a list of residues from a graph structure can be passed (refer to the read_graph function for more information).
id (str) – The id of the molecule to create. If not provided, the id will be the same as the input string.
_topology – A particular topology to use. If None, the default topology is used.

Returns:

The created Glycan molecule.

Return type:

Glycan

glycosylator.core.Glycan.read_graph(id: str, g: list, _topology=None) → Glycan[source]#

Build a molecule from a glycan graph.

Parameters:

id (str) – The id of the molecule to create
g (list) – A list of tuples in the form (parent, child, linkage), where parent and child are strings designating the residues and an “@{id}” suffix to distinguish individual residues. The linkage must be a valid id of any defined linkage in the provided or default topology, e.g. “14bb” or “16ab” for the default CHARMM topology (see example below).
_topology – A particular topology to use. If None, the default topology is used.

Returns:

molecule – The created Glycan molecule.

Return type:

Glycan

Examples

To generate a small glycan of the structure: ``` ~ — NAG MAN

/ (14bb) (16ab)

/ NAG -(14bb)- BMA

(13ab)

MAN

``` We can formulate a graph structure for the glycan above as follows: >>> graph = [ (“NAG@1”, “NAG@2”, “14bb”), (“NAG@2”, “BMA@1”, “14bb”), (“BMA@1”, “MAN@1”, “13ab”), (“BMA@1”, “MAN@2”, “16ab”), ] # notice the @{id} after each residue name

The @{id} suffix is used to distinguish between residues with the same name, for example the two mannoses (MAN@1 and MAN@2). The in the example above the ids reflect the number of residues with the same name, hence NAG@2 connects to BMA@1 “the second NAG connecting to the first BMA”. However, this is not a strict requirement. Any numeric or string value that will mark each residue as a unique node is acceptable - that is, each combination of one particuar residue is identified by a unique “{name}@{id}”. Hence, also the following graph is valid where the index simply reflects the order of the residues in the molecule:

>>> graph = [
("NAG@1", "NAG@2", "14bb"),
("NAG@2", "BMA@3", "14bb"),
("BMA@3", "MAN@4", "13ab"),
("BMA@3", "MAN@5", "16ab"),
]
or even
>>> graph = [
("NAG@a", "NAG@b", "14bb"),
("NAG@b", "BMA@c", "14bb"),
("BMA@c", "MAN@d", "13ab"),
("BMA@c", "MAN@e", "16ab"),
] # here the ids are simply letters

we can then create a molecule using: >>> mol = read_graph(“my_glycan”, graph)

glycosylator.core.Glycan.read_iupac(id: str, s: str, _topology=None) → Glycan[source]#

Make a molecule from an IUPAC glycan string in condensed format.

Parameters:

id (str) – The id of the molecule to create
s (str) – The glycan string to parse. The string must be in IUPAC condensed format - currently, neither extended nor short formats are supported.
_topology – A particular topology to use. If None, the default topology is used.

Returns:

molecule – The created Glycan molecule.

Return type:

Glycan

Examples

To generate a small glycan of the structure: ``` ~ — NAG MAN

/ (14bb) (16ab)

/ NAG -(14bb)- BMA

(13ab)

MAN

``` the IUPAC/SNFG string would be: >>> iupac = “Man(a1-6)[Man(a1-3)]b-Man(a1-4)GlcNAc(b1-4)GlcNAc(b1-” # notice the final “b1-” to indicate where the glycan attaches to a scaffold

Which can be parsed into a molecule with: >>> mol = read_iupac(“my_glycan”, iupac)

glycosylator.core.Glycan.read_snfg(id: str, s: str, _topology=None) → Glycan#

Make a molecule from an IUPAC glycan string in condensed format.

Parameters:

id (str) – The id of the molecule to create
s (str) – The glycan string to parse. The string must be in IUPAC condensed format - currently, neither extended nor short formats are supported.
_topology – A particular topology to use. If None, the default topology is used.

Returns:

molecule – The created Glycan molecule.

Return type:

Glycan

Examples

To generate a small glycan of the structure: ``` ~ — NAG MAN

/ (14bb) (16ab)

/ NAG -(14bb)- BMA

(13ab)

MAN

``` the IUPAC/SNFG string would be: >>> iupac = “Man(a1-6)[Man(a1-3)]b-Man(a1-4)GlcNAc(b1-4)GlcNAc(b1-” # notice the final “b1-” to indicate where the glycan attaches to a scaffold

Which can be parsed into a molecule with: >>> mol = read_iupac(“my_glycan”, iupac)

glycosylator.core.Glycan.write_iupac(mol: Glycan) → str[source]#

Write a molecule as an IUPAC string in condensed format.

Parameters:: mol (Glycan) – The molecule to write.
Returns:: iupac – The IUPAC/SNFG string in condensed format.
Return type:: str

glycosylator.core.Glycan.write_snfg(mol: Glycan) → str#

Write a molecule as an IUPAC string in condensed format.

Parameters:: mol (Glycan) – The molecule to write.
Returns:: iupac – The IUPAC/SNFG string in condensed format.
Return type:: str

The Scaffold#

The generic Scaffold

The Scaffold class is used to represent a scaffold structure such as a protein or membrane onto which a modification in form of one or more :class:`glycosylator.core.Glycan.Glycan`s can be added.

The protein Scaffold

The membrane Scaffold

The Scaffold class is used to represent a scaffold structure such as a protein or membrane onto which a modification in form of one or more :class:`glycosylator.core.Glycan.Glycan`s can be added.