GSM8K
get_input_data_model()
Returns GSM8K input data_model for pipeline configurations.
Returns:
| Type | Description |
|---|---|
DataModel
|
The GSM8K input data_model |
Source code in synalinks/src/datasets/built_in/gsm8k.py
get_output_data_model()
Returns GSM8K output data_model for pipeline configurations.
Returns:
| Type | Description |
|---|---|
DataModel
|
The GSM8K output data_model |
Source code in synalinks/src/datasets/built_in/gsm8k.py
iterable_dataset(repeat=1, batch_size=1, limit=None, split='train')
Streaming dataset for RL-style training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repeat
|
int
|
Number of consecutive copies of each row — set
equal to |
1
|
batch_size
|
int
|
Examples per yielded batch. |
1
|
limit
|
int
|
Optional cap on raw rows (useful for smoke tests). |
None
|
split
|
str
|
HF split to stream. Defaults to |
'train'
|
Returns:
| Type | Description |
|---|---|
HuggingFaceDataset
|
A streaming, iterable dataset. |
Source code in synalinks/src/datasets/built_in/gsm8k.py
load_data()
Load and format data from HuggingFace.
Example:
Returns:
| Type | Description |
|---|---|
tuple
|
The train and test data ready for training |