Skip to Content
DocumentationExporting DataExporting Crosslink-Spectrum-Matches to ProXL

Exporting Crosslink-Spectrum-Matches to ProXL

from pyXLMS import __version__ print(f"Installed pyXLMS version: {__version__}")
βœ“
Installed pyXLMS version: 1.7.0
from pyXLMS import parser from pyXLMS import exporter

All exporting functionality is available via the exporter submodule. We also import the parser submodule to read crosslink-spectrum-matches.

parser_result = parser.read( "../../data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.txt", engine="MS Annika", crosslinker="DSS", ) csms = parser_result["crosslink-spectrum-matches"]
βœ“
Reading MS Annika CSMs...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 826/826 [00:00<00:00, 10466.88it/s]

We read crosslink-spectrum-matches (CSMs) using the generic parserΒ  from a single MS Annika .txt file. For easier access we also assign our crosslink-spectrum-matches to the variable csms.

from pyXLMS.transform import targets_only csms = targets_only(csms)
Important

Crosslink-spectrum-matches should be filtered to only contain target-target matches! The ProXL exporter will not export decoy proteins simply because they are not available for most crosslink search engines, therefore decoy matches should be filtered out! You might also want to employ additional filter steps like validation - this choice is up to you!

_xml = exporter.to_proxl( csms, fasta_filename="../../data/_fasta/Cas9_plus10.fasta", search_engine="MS Annika", search_engine_version="3.0.1", score="higher_better", crosslinker="DSS", filename="CSMs_exported_to_ProXL.xml", schema_validation="online", )
βœ“
Successfully created ProXL XML and validated it against online XML schema!

The function exporter.to_proxl() exports a list of crosslink-spectrum-matches to ProXLΒ  format for down-stream analysis. The tool ProXL is accessible via the link yeastrc.org/proxl_publicΒ . The to_proxl() function requires several input parameters:

  • csms : list of dict of str, any
    • A list of crosslink-spectrum-matches.
  • fasta_filename : str
    • The name/path of the fasta file for reading protein sequences.
  • search_engine : str
    • Name of the used crosslink search engine.
  • search_engine_version : str
    • Version identifier of the used crosslink search engine.
  • score : str, one of "higher_better" or "lower_better"
    • If a higher CSM score is considered better, or a lower score is considered better.
  • crosslinker : str
    • Name of the used cross-linking reagent, for example β€œDSSO”. If this is not a common crosslinking reagent, parameter crosslinker_mass also needs to be specified!

The export additionally requires that charge and score fields are set for all crosslink-spectrum-matches. You can read more about the to_proxl() function and all its parameters here: docs.

Tip

Specifying filename=None will only return the XML as a string and not write it to disk!

Last updated on