str.findall & toJsonDenormalize
Two new features in tsb:
strFindall / strFindallCount / strFindFirst / strFindallExpand
(mirrors pandas.Series.str.findall)
and
toJsonDenormalize / toJsonRecords / toJsonSplit / toJsonIndex
(the inverse of jsonNormalize).
strFindall — all regex matches per elementMirrors pandas.Series.str.findall(pat). Returns a Series where each value is a JSON-encoded array of all non-overlapping matches.
// pandas equivalent:
// s.str.findall(r'\d+')
import { Series } from 'tsb';
import { strFindall, strFindallCount, strFindFirst } from 'tsb';
const prices = new Series({ data: ['$10.99 and $5.00', 'free!', '$3.50'] });
const allPrices = strFindall(prices, /\$[\d.]+/);
// Series [
// '["$10.99","$5.00"]', ← JSON string
// '[]',
// '["$3.50"]'
// ]
// Parse the JSON to get actual arrays:
JSON.parse(allPrices.values[0]); // ["$10.99", "$5.00"]
JSON.parse(allPrices.values[1]); // []
JSON.stringify(string[]) result.// When the pattern has a capture group, returns the captured value
const s = new Series({ data: ['name: Alice', 'name: Bob', 'unknown'] });
const names = strFindall(s, /name: (\w+)/);
// Series ['["Alice"]', '["Bob"]', '[]']
// First capture group is extracted (pandas behaviour)
const s = new Series({ data: ['hello', null, NaN, 'world'] });
const result = strFindall(s, /\w+/);
// Series ['["hello"]', null, null, '["world"]']
// Null/NaN elements return null (not []) — matches pandas
strFindallCount — count matches per elementimport { strFindallCount } from 'tsb';
const words = new Series({ data: ['one two three', 'four', 'five six'] });
const counts = strFindallCount(words, /\b\w+\b/);
// Series [3, 1, 2]
// Count vowels per word
const vowels = new Series({ data: ['beautiful', 'rhythm', 'aeiou'] });
strFindallCount(vowels, /[aeiou]/i);
// Series [5, 0, 5]
strFindall when you only need the count, not the matches themselves.strFindFirst — first match per elementimport { strFindFirst } from 'tsb';
const logs = new Series({ data: [
'2024-01-15: ERROR occurred',
'2024-02-20: INFO ok',
'no date here',
] });
const dates = strFindFirst(logs, /\d{4}-\d{2}-\d{2}/);
// Series ['2024-01-15', '2024-02-20', null]
// Extract just the year (first capture group)
const years = strFindFirst(logs, /(\d{4})-\d{2}-\d{2}/);
// Series ['2024', '2024', null]
strFindallExpand — expand capture groups into a DataFrameMirrors pandas.Series.str.extract(pat, expand=True).
import { strFindallExpand } from 'tsb';
const people = new Series({ data: ['John 30', 'Jane 25', 'unknown'] });
// Named capture groups → column names
const df = strFindallExpand(people, /(?<name>\w+)\s+(?<age>\d+)/);
// name age
// 0 John 30
// 1 Jane 25
// 2 null null
// Unnamed groups → numbered columns "0", "1", ...
const df2 = strFindallExpand(people, /(\w+)\s+(\d+)/);
// 0 1
// 0 John 30
// 1 Jane 25
// 2 null null
toJsonDenormalize — flat DataFrame → nested JSONThe inverse of jsonNormalize: takes a DataFrame with dot-separated column names and reconstructs nested JSON objects.
import { DataFrame } from 'tsb';
import { toJsonDenormalize } from 'tsb';
// Start with a flattened DataFrame (as jsonNormalize would produce)
const flat = DataFrame.fromColumns({
name: ['Alice', 'Bob'],
'address.city': ['New York', 'Los Angeles'],
'address.zip': ['10001', '90001'],
'address.country':['US', 'US'],
});
// Reconstruct nested JSON
const records = toJsonDenormalize(flat);
// [
// { name: 'Alice', address: { city: 'New York', zip: '10001', country: 'US' } },
// { name: 'Bob', address: { city: 'Los Angeles', zip: '90001', country: 'US' } },
// ]
// Round-trip: jsonNormalize → toJsonDenormalize
import { jsonNormalize } from 'tsb';
const original = [
{ user: { name: 'Alice', age: 30 }, score: 100 },
{ user: { name: 'Bob', age: 25 }, score: 200 },
];
const df = jsonNormalize(original);
const recovered = toJsonDenormalize(df);
// recovered ≈ original (with the same structure)
// If jsonNormalize was called with sep='__'
const df2 = DataFrame.fromColumns({
'user__name': ['Alice'],
'user__city': ['NYC'],
});
toJsonDenormalize(df2, { sep: '__' });
// [{ user: { name: 'Alice', city: 'NYC' } }]
const df3 = DataFrame.fromColumns({ a: [1, null], b: [null, 2] });
toJsonDenormalize(df3, { dropNull: true });
// [{ a: 1 }, { b: 2 }] ← null fields are omitted
toJsonRecords — orient="records"import { toJsonRecords } from 'tsb';
const df = DataFrame.fromColumns({ a: [1, 2], b: ['x', 'y'] });
toJsonRecords(df);
// [{ a: 1, b: 'x' }, { a: 2, b: 'y' }]
toJsonSplit — orient="split"import { toJsonSplit } from 'tsb';
toJsonSplit(df);
// { columns: ['a', 'b'], index: [0, 1], data: [[1, 'x'], [2, 'y']] }
toJsonSplit(df, { includeIndex: false });
// { columns: ['a', 'b'], data: [[1, 'x'], [2, 'y']] }
toJsonIndex — orient="index"import { toJsonIndex } from 'tsb';
toJsonIndex(df);
// { '0': { a: 1, b: 'x' }, '1': { a: 2, b: 'y' } }
// With custom string index
const df2 = DataFrame.fromColumns(
{ v: [10, 20] },
{ index: ['alice', 'bob'] }
);
toJsonIndex(df2);
// { alice: { v: 10 }, bob: { v: 20 } }
| Function | Signature | pandas equivalent |
|---|---|---|
strFindall |
(input, pat, flags?) → Series<Scalar> |
s.str.findall(pat) |
strFindallCount |
(input, pat, flags?) → Series<Scalar> |
s.str.findall(pat).map(len) |
strFindFirst |
(input, pat, flags?) → Series<Scalar> |
s.str.extract(pat)[0] |
strFindallExpand |
(input, pat, flags?) → DataFrame |
s.str.extract(pat, expand=True) |
toJsonDenormalize |
(df, options?) → JsonRecord[] |
inverse of json_normalize |
toJsonRecords |
(df) → JsonRecord[] |
df.to_json(orient='records') |
toJsonSplit |
(df, options?) → JsonSplitResult |
df.to_json(orient='split') |
toJsonIndex |
(df) → JsonRecord |
df.to_json(orient='index') |