BBH
get_input_data_model()
get_output_data_model()
iterable_dataset(repeat=1, batch_size=1, limit=None, split='test')
Streaming dataset for RL-style training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repeat
|
int
|
Number of consecutive copies of each row — set
equal to |
1
|
batch_size
|
int
|
Examples per yielded batch. |
1
|
limit
|
int
|
Optional cap on raw rows (useful for smoke tests). |
None
|
split
|
str
|
HF split to stream. Defaults to |
'test'
|
Returns:
| Type | Description |
|---|---|
HuggingFaceDataset
|
A streaming, iterable dataset. |
Source code in synalinks/src/datasets/built_in/bbh.py
load_data(validation_split=0.2)
Load BIG-Bench Hard (boolean_expressions task).
BBH ships only a test split (~250 rows per task), so we split
it deterministically into train / test.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
validation_split
|
float
|
Fraction held out for evaluation
(default |
0.2
|
Returns:
| Type | Description |
|---|---|
tuple
|
|