Exporting Crosslinks to xlms-tools


from pyXLMS import __version__
 
print(f"Installed pyXLMS version: {__version__}")

✓


    Installed pyXLMS version: 1.5.3


from pyXLMS import parser
from pyXLMS import exporter

All exporting functionality is available via the exporter submodule. We also import the parser submodule to read crosslink-spectrum-matches and crosslinks.


parser_result = parser.read(
    "../../data/_test/exporter/pyxlinkviewer/unique_links_all_pyxlms.csv",
    engine="Custom",
    crosslinker="DSSO",
)
xls = parser_result["crosslinks"]

✓


    Reading crosslinks...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<?, ?it/s]

We read crosslinks using the generic parser from a single .csv file. For easier access we also assign our crosslinks to the variable xls.


result_dictionary = exporter.to_xlmstools(xls, pdb_file="6YHU", filename_prefix=None)

The function exporter.to_xlmstools() exports a list of crosslinks to xlms-tools format for protein structure analysis. The python package xlms-tools is available from gitlab.com/topf-lab/xlms-tools . This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.

The required parameters are a list of crosslinks of the protein(-complex) of interest via crosslinks or as the positionally first argument. Secondly, the protein(-complex) structure needs to be provided via pdb_file which can either be an identifier from the PDB or a local .pdb file (both path and file are accepted inputs). You can read more about the to_xlmstools() function and all its parameters here: docs.

Important

Please note that the input crosslinks should be filtered to contain only target-target crosslinks of the proteins in the .pdb file! Most likely you also only want to keep validated crosslinks!

Tip

Specifying filename_prefix=None will only return the calculated results but not write them to disk!


print(result_dictionary["xlms-tools"][:60])
# only the first 60 characters of this output are displayed below

✓


    82|B|123|B|
    82|B|123|D|
    82|D|123|B|
    82|D|123|D|
    123|B|97|B|

The function returns a dictionary

with key "xlms-tools" containing the formatted text for xlms-tools,
with key "xlms-tools DataFrame" containing the information from xlms-tools but as a pandas DataFrame,
with key "Number of mapped crosslinks" containing the total number of mapped crosslinks,
with key "Mapping" containing a string that logs how crosslinks were mapped to the protein structure,
with key "Parsed PDB sequence" containing the protein sequence that was parsed from the PDB file,
with key "Parsed PDB chains" containing the parsed chains from the PDB file,
with key "Parsed PDB residue numbers" containing the parsed residue numbers from the PDB file,
and with key "Exported files" containing a list of filenames of all files that were written to disk.

Important

For information on how to control the sequence alignment and matching process please refer to the documentation .