Exporting Crosslinks to xlms-tools
from pyXLMS import __version__
print(f"Installed pyXLMS version: {__version__}") Installed pyXLMS version: 1.5.3from pyXLMS import parser
from pyXLMS import exporterAll exporting functionality is available via the exporter submodule. We also import the parser submodule to read crosslink-spectrum-matches and crosslinks.
parser_result = parser.read(
"../../data/_test/exporter/pyxlinkviewer/unique_links_all_pyxlms.csv",
engine="Custom",
crosslinker="DSSO",
)
xls = parser_result["crosslinks"] Reading crosslinks...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 25/25 [00:00<?, ?it/s]We read crosslinks using the generic parserΒ from a single .csv file. For easier access we also assign our crosslinks to the variable xls.
result_dictionary = exporter.to_xlmstools(xls, pdb_file="6YHU", filename_prefix=None)The function exporter.to_xlmstools() exports a list of crosslinks to xlms-toolsΒ format for protein structure analysis. The python package xlms-tools is available from gitlab.com/topf-lab/xlms-toolsΒ . This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.
The required parameters are a list of crosslinks of the protein(-complex) of interest via crosslinks or as the positionally first argument. Secondly, the protein(-complex) structure needs to be provided via pdb_file which can either be an identifier from the PDBΒ or a local .pdb file (both path and file are accepted inputs). You can read more about the to_xlmstools() function and all its parameters here: docs.
Please note that the input crosslinks should be filtered to contain only target-target crosslinks of the proteins in the .pdb file! Most likely you also only want to keep validated crosslinks!
Specifying filename_prefix=None will only return the calculated results but not write them to disk!
print(result_dictionary["xlms-tools"][:60])
# only the first 60 characters of this output are displayed below 82|B|123|B|
82|B|123|D|
82|D|123|B|
82|D|123|D|
123|B|97|B|The function returns a dictionary
- with key
"xlms-tools"containing the formatted text for xlms-tools, - with key
"xlms-tools DataFrame"containing the information from xlms-tools but as a pandasΒDataFrame, - with key
"Number of mapped crosslinks"containing the total number of mapped crosslinks, - with key
"Mapping"containing a string that logs how crosslinks were mapped to the protein structure, - with key
"Parsed PDB sequence"containing the protein sequence that was parsed from the PDB file, - with key
"Parsed PDB chains"containing the parsed chains from the PDB file, - with key
"Parsed PDB residue numbers"containing the parsed residue numbers from the PDB file, - and with key
"Exported files"containing a list of filenames of all files that were written to disk.
For information on how to control the sequence alignment and matching process please refer to the documentationΒ .