Serializing pyXLMS Objects
If you want to save your data to continue your analysis at a later point or with a different tool or programming language you need serialization. pyXLMS supports two different ways to serialize and de-serialize crosslink-spectrum-matches, crosslinks, and parser_result objects which is illustrated by the examples below.
from pyXLMS import __version__
print(f"Installed pyXLMS version: {__version__}") Installed pyXLMS version: 1.5.3from pyXLMS import parser
from pyXLMS import transformWe import the parser submodule to read crosslink-spectrum-matches and crosslinks, and the transform submodule for serialization and to show some summary statistics.
parser_result = parser.read(
"../data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult",
engine="MS Annika",
crosslinker="DSS",
) Reading MS Annika CSMs...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββ| 826/826 [00:00<00:00, 3615.67it/s]
Reading MS Annika crosslinks...: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 300/300 [00:00<00:00, 8051.05it/s]We read crosslink-spectrum-matches and crosslinks using the generic parserΒ from a single .pdResult file.
csms = parser_result["crosslink-spectrum-matches"]
xls = parser_result["crosslinks"]For easier access we assign our crosslink-spectrum-matches to the variable csms and our crosslinks to the variable xls.
_ = transform.summary(csms) Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.1132827525593785
Maximum CSM score: 452.9861536355926With the transform.summary() method we can also print out some summary statistics about our read crosslink-spectrum-matches. You can read more about the method here: docs.
_ = transform.summary(xls) Number of crosslinks: 300.0
Number of unique crosslinks by peptide: 300.0
Number of unique crosslinks by protein: 298.0
Number of intra crosslinks: 279.0
Number of inter crosslinks: 21.0
Number of target-target crosslinks: 265.0
Number of target-decoy crosslinks: 0.0
Number of decoy-decoy crosslinks: 35.0
Minimum crosslink score: 1.1132827525593785
Maximum crosslink score: 452.9861536355926Also some summary statistics about our read crosslinks.
Serialization to Tables
pyXLMS crosslink-spectrum-matches and crosslinks can be serialized and saved as tables to any format supported by pandasΒ .
De-serialization, e.g. reading the created tables, is only available via text-based formats (e.g. csv, tsv), Microsoft Excel format, and parquetΒ format.
Table-based serialization of parser_result objects is not supported!
Serialization to CSV
transform.to_dataframe(csms).to_csv("csms.csv")Serialization to tables is facilitated via the transform.to_dataframe() function [docsΒ ] which returns a pandasΒ DataFrame and can then be saved to disk via one of the pandas functions.
loaded_csms = parser.read_custom("csms.csv") Reading CSMs...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 826/826 [00:00<00:00, 3392.79it/s]Using the function parser.read_custom() [docsΒ , page] we can the read back the data from the fileβ¦
_ = transform.summary(loaded_csms) Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.1132827525593785
Maximum CSM score: 452.9861536355926β¦which will give us back our original crosslink-spectrum-matches.
Serialization to Parquet
transform.to_dataframe(xls).to_parquet("xls.parquet")Serialization to tables is facilitated via the transform.to_dataframe() function [docsΒ ] which returns a pandasΒ DataFrame and can then be saved to disk via one of the pandas functions.
loaded_xls = parser.read_custom("xls.parquet") Reading crosslinks...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 300/300 [00:00<00:00, 6181.46it/s]Using the function parser.read_custom() [docsΒ , page] we can the read back the data from the fileβ¦
_ = transform.summary(loaded_xls) Number of crosslinks: 300.0
Number of unique crosslinks by peptide: 300.0
Number of unique crosslinks by protein: 298.0
Number of intra crosslinks: 279.0
Number of inter crosslinks: 21.0
Number of target-target crosslinks: 265.0
Number of target-decoy crosslinks: 0.0
Number of decoy-decoy crosslinks: 35.0
Minimum crosslink score: 1.1132827525593785
Maximum crosslink score: 452.9861536355926β¦which will give us back our original crosslinks.
Serialization to JSON
All pyXLMS objects - including whole parser_result objects - are serializable to and de-serializable from JavaScript Object NotationΒ (JSON) using the standard python json library.
import json
from typing import Any
def to_json(data: Any, filename: str) -> None:
with open(filename, "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=4)
return
to_json(csms, "csms.json")
to_json(xls, "xls.json")
to_json(parser_result, "pr.json")We can serialize our crosslink-spectrum-matches, crosslinks, and parser_result using json.dump()β¦
def from_json(filename: str) -> Any:
with open(filename, "r", encoding="utf-8") as f:
return json.load(f)
loaded_csms = from_json("csms.json")
loaded_xls = from_json("xls.json")
loaded_parser_result = from_json("pr.json")β¦and we can de-serialize our JSON files again using json.load()β¦
_ = transform.summary(loaded_csms) Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.1132827525593785
Maximum CSM score: 452.9861536355926β¦which will give us back our original crosslink-spectrum-matches,β¦
_ = transform.summary(loaded_xls) Number of crosslinks: 300.0
Number of unique crosslinks by peptide: 300.0
Number of unique crosslinks by protein: 298.0
Number of intra crosslinks: 279.0
Number of inter crosslinks: 21.0
Number of target-target crosslinks: 265.0
Number of target-decoy crosslinks: 0.0
Number of decoy-decoy crosslinks: 35.0
Minimum crosslink score: 1.1132827525593785
Maximum crosslink score: 452.9861536355926β¦crosslinks,β¦
_ = transform.summary(loaded_parser_result) Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.1132827525593785
Maximum CSM score: 452.9861536355926
Number of crosslinks: 300.0
Number of unique crosslinks by peptide: 300.0
Number of unique crosslinks by protein: 298.0
Number of intra crosslinks: 279.0
Number of inter crosslinks: 21.0
Number of target-target crosslinks: 265.0
Number of target-decoy crosslinks: 0.0
Number of decoy-decoy crosslinks: 35.0
Minimum crosslink score: 1.1132827525593785
Maximum crosslink score: 452.9861536355926β¦and parser_result.