tsb

string_ops Standalone string operations for Series and arrays

string_ops provides module-level string functions that complement the Series.str accessor. All functions accept a Series, a string[], or a scalar string.

strNormalize — Unicode normalisation

Normalise every element to NFC, NFD, NFKC, or NFKD. Useful when mixing text from different sources (e.g. macOS NFD vs Windows NFC).


  

strGetDummies — one-hot encode by delimiter

Split each string by a delimiter and produce a binary indicator DataFrame — one column per unique token. Equivalent to pandas.Series.str.get_dummies().


  

strExtractAll — extract all regex matches

Find every non-overlapping regex match in each element. Returns a JSON-encoded array of match arrays per element — parse with JSON.parse.


  

strRemovePrefix / strRemoveSuffix

Strip a leading or trailing string from elements only when it is present.


  

strTranslate — character-level substitution

Replace or delete individual characters using a lookup table. Format: one mapping per line as from=to or from= to delete.


  

strCharWidth & strByteLength — display & byte widths

strCharWidth counts columns for terminal display (CJK chars count as 2).
strByteLength counts UTF-8 bytes (useful for byte-limited APIs).