LAMBADA
get_input_data_model()
get_output_data_model()
iterable_dataset(repeat=1, batch_size=1, limit=None, split='test')
Streaming dataset for RL-style training.
Returns:
| Type | Description |
|---|---|
HuggingFaceDataset
|
A streaming, iterable dataset. |
Source code in synalinks/src/datasets/built_in/lambada.py
load_data(validation_split=0.2)
Load LAMBADA (OpenAI variant).
HF ships only a test split (~5k passages), so we split it
deterministically into train / test.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
validation_split
|
float
|
Fraction held out for evaluation
(default |
0.2
|
Returns:
| Type | Description |
|---|---|
tuple
|
|