Configuration Snapshots for Reproducibility#
Hypster provides a way to capture a snapshot of the configuration for reproducibility purposes.
This is especially useful for reproducibility purposes in Machine Learning & AI projects or any scenario where you need to recreate exact configurations.
When using
hp.propagate
, the resulting snapshot also returns values from nested configurations.
Using return_config_snapshot=True
#
When calling a Hypster configuration, you can set return_config_snapshot=True
to get a dictionary of all instantiated values.
Example:
%%writefile llm_model.py
#This is a mock class for demonstration purposes
class LLMModel:
def __init__(self, chunking, model, config):
self.chunking = chunking
self.model = model
self.config = config
def __eq__(self, other):
return (self.chunking == other.chunking and
self.model == other.model and
self.config == other.config)
Overwriting llm_model.py
from hypster import HP, config
@config
def my_config(hp: HP):
from llm_model import LLMModel
chunking_strategy = hp.select(["paragraph", "semantic", "fixed"], default="paragraph")
llm_model = hp.select(
{"haiku": "claude-3-haiku-20240307", "sonnet": "claude-3-5-sonnet-20240620", "gpt-4o-mini": "gpt-4o-mini"},
default="gpt-4o-mini",
)
llm_config = {"temperature": hp.number(0), "max_tokens": 64}
model = LLMModel(chunking_strategy, llm_model, llm_config)
results, snapshot = my_config(selections={"llm_model": "haiku"}, return_config_snapshot=True)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[2], line 1
----> 1 from hypster import HP, config
4 @config
5 def my_config(hp: HP):
6 from llm_model import LLMModel
ModuleNotFoundError: No module named 'hypster'
results
{'chunking_strategy': 'paragraph',
'llm_model': 'claude-3-haiku-20240307',
'llm_config': {'temperature': 0, 'max_tokens': 64},
'model': <llm_model.LLMModel at 0x111c3ff40>}
snapshot
{'chunking_strategy': 'paragraph',
'llm_model': 'claude-3-haiku-20240307',
'llm_config.temperature': 0}
The difference between the results
and snapshot
are subtle, but important:
results
contains the instantiated results from the selections & overrides of the config function.Notice the
'model'
output in theresults
dictionary
snapshot
contains the values that are necessary to get the exact output by using overrides=snapshotNotice that
'model'
isn’t found in the snapshot since it is a byproduct of the previous selected parameters (chunking_strategy
,llm_model
, etc…)Notice that we have
llm_config.temperature
only, since thismax_tokens
isn’t a configurable parameter.
Example Usage:#
reproduced_results = my_config(overrides=snapshot)
assert reproduced_results == results # This should be True
This ensures that you can recreate the exact configuration state, which is crucial for reproducibility in machine learning experiments, ensuring consistent results across multiple runs or different environments.
Nested Configurations#
When using hp.propagate
, the snapshot captures the entire hierarchy of configurations:
from hypster import HP, config, save
@config
def my_config(hp: HP):
llm_model = hp.select(
{"haiku": "claude-3-haiku-20240307", "sonnet": "claude-3-5-sonnet-20240620", "gpt-4o-mini": "gpt-4o-mini"},
default="gpt-4o-mini",
)
llm_config = {"temperature": hp.number(0), "max_tokens": hp.number(64)}
save(my_config, "my_config.py")
We can then
load
it from its path and have it be part of the parent configuration.We can select & override values within our nested configuration by using dot notation
@config
def my_config_parent(hp: HP):
import hypster
my_config = hypster.load("my_config.py")
my_conf = hp.propagate(my_config)
a = hp.select(["a", "b", "c"], default="a")
final_vars = ["my_conf", "a"]
results, snapshot = my_config_parent(
final_vars=final_vars, selections={"my_conf.llm_model": "haiku"}, overrides={"a": "d"}, return_config_snapshot=True
)
results
{'my_conf': {'llm_model': 'claude-3-haiku-20240307',
'llm_config': {'temperature': 0, 'max_tokens': 64}},
'a': 'd'}
snapshot
{'my_conf.llm_model': 'claude-3-haiku-20240307',
'my_conf.llm_config.temperature': 0,
'my_conf.llm_config.max_tokens': 64,
'a': 'd'}
reproduced_results = my_config_parent(final_vars=final_vars, overrides=snapshot)
assert reproduced_results == results