Batching

Process large datasets by splitting them into configurable batches with deduplication, filtering, and automatic result aggregation.

How Batching Works

When a workflow step receives an array of items (leads, investors, URLs), batching splits it into smaller chunks, processes each batch independently, then aggregates the results. This prevents timeouts, manages API rate limits, and enables parallel processing.

The pipeline: Item Resolution → Batch Creation → Parallel Execution → Aggregation.

Configuration

Enable batching on any step by adding a batchConfig object:

{
  "batchConfig": {
    "enabled": true,
    "size": 25,
    "sourceVariable": "leads",
    "idField": "email",
    "qualifiedOnly": false,
    "maxItems": 500
  }
}

Field	Type	Description
size	number	Items per batch (default: 25)
sourceVariable	string	Variable name containing the array to batch
idField	string	Field used for deduplication (e.g., "email")
qualifiedOnly	boolean	Filter to only items that passed prior scoring
maxItems	number	Maximum total items to process

Deduplication

When an idField is set, batching deduplicates items before processing. If duplicates exist, the item with the highest score is kept. This prevents re-processing the same lead or entity across batches.

Result Aggregation

After all batches complete, results are merged and metrics are computed automatically:

•Totals — total items processed, passed, failed
•Averages — mean score, median score
•Tier counts — distribution across score tiers (high, medium, low)
•Score distribution — histogram of scores across all items

Example: Scoring 2,000 Investors

An investor matching workflow processes a database of 2,000 investors with batching:

Step: "Score Investors"
  batchConfig:
    size: 50
    sourceVariable: "investors"
    idField: "investor_id"
    maxItems: 2000

Execution:
  40 batches of 50 investors each
  Deduplication removed 23 duplicates
  Processing time: 4m 12s

Aggregated results:
  Total scored:    1,977
  High tier (8+):    312  (15.8%)
  Medium tier (5-7): 891  (45.1%)
  Low tier (<5):     774  (39.1%)
  Mean score:       5.7

Edge Cases

•Empty arrays — Batching completes immediately with zero-count metrics
•Failed batches — Individual batch failures don't halt the entire run; failed items are reported in aggregated results
•Over maxItems — Items beyond the limit are silently dropped before batch creation

Next Steps

AI Steps — Configure AI reasoning within batched steps
Scoring & Feedback — How batch scores feed back into future runs
Loops & Conditionals — Control flow for complex workflows