Configuration Snapshots for Reproducibility

Configuration Snapshots for Reproducibility#

  • Hypster provides a way to capture a snapshot of the configuration for reproducibility purposes.

  • This is especially useful for reproducibility purposes in Machine Learning & AI projects or any scenario where you need to recreate exact configurations.

  • When using hp.propagate, the resulting snapshot also returns values from nested configurations.

Using return_config_snapshot=True#

When calling a Hypster configuration, you can set return_config_snapshot=True to get a dictionary of all instantiated values.

Example:

%%writefile llm_model.py

#This is a mock class for demonstration purposes
class LLMModel:
    def __init__(self, chunking, model, config):
        self.chunking = chunking
        self.model = model
        self.config = config
    
    def __eq__(self, other):
        return (self.chunking == other.chunking and
                self.model == other.model and
                self.config == other.config)
Overwriting llm_model.py
from hypster import HP, config


@config
def my_config(hp: HP):
    from llm_model import LLMModel

    chunking_strategy = hp.select(["paragraph", "semantic", "fixed"], default="paragraph")

    llm_model = hp.select(
        {"haiku": "claude-3-haiku-20240307", "sonnet": "claude-3-5-sonnet-20240620", "gpt-4o-mini": "gpt-4o-mini"},
        default="gpt-4o-mini",
    )

    llm_config = {"temperature": hp.number(0), "max_tokens": 64}

    model = LLMModel(chunking_strategy, llm_model, llm_config)


results, snapshot = my_config(selections={"llm_model": "haiku"}, return_config_snapshot=True)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 from hypster import HP, config
      4 @config
      5 def my_config(hp: HP):
      6     from llm_model import LLMModel

ModuleNotFoundError: No module named 'hypster'
results
{'chunking_strategy': 'paragraph',
 'llm_model': 'claude-3-haiku-20240307',
 'llm_config': {'temperature': 0, 'max_tokens': 64},
 'model': <llm_model.LLMModel at 0x111c3ff40>}
snapshot
{'chunking_strategy': 'paragraph',
 'llm_model': 'claude-3-haiku-20240307',
 'llm_config.temperature': 0}

The difference between the results and snapshot are subtle, but important:

  • results contains the instantiated results from the selections & overrides of the config function.

    • Notice the 'model' output in the results dictionary

  • snapshot contains the values that are necessary to get the exact output by using overrides=snapshot

    • Notice that 'model' isn’t found in the snapshot since it is a byproduct of the previous selected parameters (chunking_strategy, llm_model, etc…)

    • Notice that we have llm_config.temperature only, since this max_tokens isn’t a configurable parameter.

Example Usage:#

reproduced_results = my_config(overrides=snapshot)
assert reproduced_results == results  # This should be True

This ensures that you can recreate the exact configuration state, which is crucial for reproducibility in machine learning experiments, ensuring consistent results across multiple runs or different environments.

Nested Configurations#

When using hp.propagate, the snapshot captures the entire hierarchy of configurations:

from hypster import HP, config, save


@config
def my_config(hp: HP):
    llm_model = hp.select(
        {"haiku": "claude-3-haiku-20240307", "sonnet": "claude-3-5-sonnet-20240620", "gpt-4o-mini": "gpt-4o-mini"},
        default="gpt-4o-mini",
    )

    llm_config = {"temperature": hp.number(0), "max_tokens": hp.number(64)}
save(my_config, "my_config.py")
  • We can then load it from its path and have it be part of the parent configuration.

  • We can select & override values within our nested configuration by using dot notation

@config
def my_config_parent(hp: HP):
    import hypster

    my_config = hypster.load("my_config.py")
    my_conf = hp.propagate(my_config)
    a = hp.select(["a", "b", "c"], default="a")
final_vars = ["my_conf", "a"]

results, snapshot = my_config_parent(
    final_vars=final_vars, selections={"my_conf.llm_model": "haiku"}, overrides={"a": "d"}, return_config_snapshot=True
)
results
{'my_conf': {'llm_model': 'claude-3-haiku-20240307',
  'llm_config': {'temperature': 0, 'max_tokens': 64}},
 'a': 'd'}
snapshot
{'my_conf.llm_model': 'claude-3-haiku-20240307',
 'my_conf.llm_config.temperature': 0,
 'my_conf.llm_config.max_tokens': 64,
 'a': 'd'}
reproduced_results = my_config_parent(final_vars=final_vars, overrides=snapshot)
assert reproduced_results == results