🔍 tsb — str.findall & toJsonDenormalize

Two new features in tsb: strFindall / strFindallCount / strFindFirst / strFindallExpand (mirrors pandas.Series.str.findall) and toJsonDenormalize / toJsonRecords / toJsonSplit / toJsonIndex (the inverse of jsonNormalize).

← Back to feature index

1. strFindall — all regex matches per element

Mirrors pandas.Series.str.findall(pat). Returns a Series where each value is a JSON-encoded array of all non-overlapping matches.

// pandas equivalent:
// s.str.findall(r'\d+')

import { Series } from 'tsb';
import { strFindall, strFindallCount, strFindFirst } from 'tsb';

const prices = new Series({ data: ['$10.99 and $5.00', 'free!', '$3.50'] });

const allPrices = strFindall(prices, /\$[\d.]+/);
// Series [
//   '["$10.99","$5.00"]',   ← JSON string
//   '[]',
//   '["$3.50"]'
// ]

// Parse the JSON to get actual arrays:
JSON.parse(allPrices.values[0]); // ["$10.99", "$5.00"]
JSON.parse(allPrices.values[1]); // []
✅ Each element contains a JSON.stringify(string[]) result.

With capture groups

// When the pattern has a capture group, returns the captured value
const s = new Series({ data: ['name: Alice', 'name: Bob', 'unknown'] });
const names = strFindall(s, /name: (\w+)/);
// Series ['["Alice"]', '["Bob"]', '[]']

// First capture group is extracted (pandas behaviour)

Null / NaN handling

const s = new Series({ data: ['hello', null, NaN, 'world'] });
const result = strFindall(s, /\w+/);
// Series ['["hello"]', null, null, '["world"]']
// Null/NaN elements return null (not []) — matches pandas

2. strFindallCount — count matches per element

import { strFindallCount } from 'tsb';

const words = new Series({ data: ['one two three', 'four', 'five six'] });
const counts = strFindallCount(words, /\b\w+\b/);
// Series [3, 1, 2]

// Count vowels per word
const vowels = new Series({ data: ['beautiful', 'rhythm', 'aeiou'] });
strFindallCount(vowels, /[aeiou]/i);
// Series [5, 0, 5]
💡 More efficient than strFindall when you only need the count, not the matches themselves.

3. strFindFirst — first match per element

import { strFindFirst } from 'tsb';

const logs = new Series({ data: [
  '2024-01-15: ERROR occurred',
  '2024-02-20: INFO ok',
  'no date here',
] });

const dates = strFindFirst(logs, /\d{4}-\d{2}-\d{2}/);
// Series ['2024-01-15', '2024-02-20', null]

// Extract just the year (first capture group)
const years = strFindFirst(logs, /(\d{4})-\d{2}-\d{2}/);
// Series ['2024', '2024', null]

4. strFindallExpand — expand capture groups into a DataFrame

Mirrors pandas.Series.str.extract(pat, expand=True).

import { strFindallExpand } from 'tsb';

const people = new Series({ data: ['John 30', 'Jane 25', 'unknown'] });

// Named capture groups → column names
const df = strFindallExpand(people, /(?<name>\w+)\s+(?<age>\d+)/);
//    name  age
// 0  John  30
// 1  Jane  25
// 2  null  null

// Unnamed groups → numbered columns "0", "1", ...
const df2 = strFindallExpand(people, /(\w+)\s+(\d+)/);
//    0     1
// 0  John  30
// 1  Jane  25
// 2  null  null

5. toJsonDenormalize — flat DataFrame → nested JSON

The inverse of jsonNormalize: takes a DataFrame with dot-separated column names and reconstructs nested JSON objects.

import { DataFrame } from 'tsb';
import { toJsonDenormalize } from 'tsb';

// Start with a flattened DataFrame (as jsonNormalize would produce)
const flat = DataFrame.fromColumns({
  name:             ['Alice', 'Bob'],
  'address.city':   ['New York', 'Los Angeles'],
  'address.zip':    ['10001',    '90001'],
  'address.country':['US',       'US'],
});

// Reconstruct nested JSON
const records = toJsonDenormalize(flat);
// [
//   { name: 'Alice', address: { city: 'New York',    zip: '10001', country: 'US' } },
//   { name: 'Bob',   address: { city: 'Los Angeles', zip: '90001', country: 'US' } },
// ]

// Round-trip: jsonNormalize → toJsonDenormalize
import { jsonNormalize } from 'tsb';
const original = [
  { user: { name: 'Alice', age: 30 }, score: 100 },
  { user: { name: 'Bob',   age: 25 }, score: 200 },
];
const df = jsonNormalize(original);
const recovered = toJsonDenormalize(df);
// recovered ≈ original (with the same structure)

Custom separator

// If jsonNormalize was called with sep='__'
const df2 = DataFrame.fromColumns({
  'user__name': ['Alice'],
  'user__city': ['NYC'],
});
toJsonDenormalize(df2, { sep: '__' });
// [{ user: { name: 'Alice', city: 'NYC' } }]

Drop null values

const df3 = DataFrame.fromColumns({ a: [1, null], b: [null, 2] });
toJsonDenormalize(df3, { dropNull: true });
// [{ a: 1 }, { b: 2 }]  ← null fields are omitted

6. JSON serialization utilities

toJsonRecords — orient="records"

import { toJsonRecords } from 'tsb';
const df = DataFrame.fromColumns({ a: [1, 2], b: ['x', 'y'] });
toJsonRecords(df);
// [{ a: 1, b: 'x' }, { a: 2, b: 'y' }]

toJsonSplit — orient="split"

import { toJsonSplit } from 'tsb';
toJsonSplit(df);
// { columns: ['a', 'b'], index: [0, 1], data: [[1, 'x'], [2, 'y']] }

toJsonSplit(df, { includeIndex: false });
// { columns: ['a', 'b'], data: [[1, 'x'], [2, 'y']] }

toJsonIndex — orient="index"

import { toJsonIndex } from 'tsb';
toJsonIndex(df);
// { '0': { a: 1, b: 'x' }, '1': { a: 2, b: 'y' } }

// With custom string index
const df2 = DataFrame.fromColumns(
  { v: [10, 20] },
  { index: ['alice', 'bob'] }
);
toJsonIndex(df2);
// { alice: { v: 10 }, bob: { v: 20 } }

API reference

FunctionSignaturepandas equivalent
strFindall (input, pat, flags?) → Series<Scalar> s.str.findall(pat)
strFindallCount (input, pat, flags?) → Series<Scalar> s.str.findall(pat).map(len)
strFindFirst (input, pat, flags?) → Series<Scalar> s.str.extract(pat)[0]
strFindallExpand (input, pat, flags?) → DataFrame s.str.extract(pat, expand=True)
toJsonDenormalize (df, options?) → JsonRecord[] inverse of json_normalize
toJsonRecords (df) → JsonRecord[] df.to_json(orient='records')
toJsonSplit (df, options?) → JsonSplitResult df.to_json(orient='split')
toJsonIndex (df) → JsonRecord df.to_json(orient='index')