Chapter 2: Experiment Execution Orchestration (main.py
)
Welcome back! In Chapter 1: Configuration Management (RunConfig
/ OptimizeConfig
), we learned how to create a detailed “recipe card” (our YAML configuration file) and how the RunConfig
class helps us read that recipe into our Python code. We have the plan, but how do we actually start the cooking process?
Think about building a house. You have the blueprints (RunConfig
). Now you need a project manager to read the blueprints, gather the workers and materials (data, model), tell them what to do, and oversee the construction (training).
In pytorch_template
, the file main.py
acts as this project manager. It’s the main entry point – the script you run from your terminal to kick off the entire experiment.
The Problem: Starting and Managing the Experiment
You have your configuration file (run_template.yaml
) ready. How do you tell the computer: “Okay, use this configuration file, load the data, build the model described in the file, and start training it”?
Furthermore, what if you have two configuration files: one for a basic run (run_template.yaml
) and another (optimize_template.yaml
) that describes how to search for the best settings? How does the program know whether to run a single experiment or start a big search?
This is the job of main.py
: Experiment Execution Orchestration. It orchestrates (coordinates) all the necessary steps to get an experiment running based on the instructions you provide.
Key Tasks of main.py
main.py
is relatively simple because it delegates most complex tasks, but its coordination role is vital. Here’s what it does:
- Reads Instructions: It takes the path to your main configuration file (
.yaml
) as an input when you run the script. - Understands Your Goal: It checks if you also provided an optimization configuration file. This tells it whether you want a single run or a hyperparameter search.
- Prepares Resources: It loads the necessary dataset (like gathering ingredients).
- Initiates Work:
- If it’s a single run, it uses the
RunConfig
to set up everything and calls therun
function (which manages the training process, involving the Chapter 3: Training Loop (Trainer
)). - If it’s an optimization search, it sets up an “objective” function using the
OptimizeConfig
and tells the Optuna library (covered in Chapter 5: Hyperparameter Optimization (Optuna Integration)) to start searching.
- If it’s a single run, it uses the
How to Use main.py
You interact with main.py
through your command line terminal.
Use Case 1: Running a Single Experiment
Let’s say you want to run the experiment defined in configs/run_template.yaml
.
- Open your terminal.
- Navigate to the project directory.
- Run the following command:
python main.py --run_config configs/run_template.yaml
python main.py
: Tells Python to execute themain.py
script.--run_config configs/run_template.yaml
: This is an argument tellingmain.py
which configuration file to use for the experiment settings.
What happens? main.py
will:
- Read
configs/run_template.yaml
usingRunConfig
. - Load the training and validation data.
- Call the
run
function (fromutil.py
) with the loaded configuration and data. - The
run
function will then set up the model, optimizer, etc., and start the training loop (using theTrainer
). You’ll see output messages showing the training progress (epoch number, loss values). - Results (like the trained model) will be saved in a
runs/
directory.
Use Case 2: Running Hyperparameter Optimization
Now, imagine you want to find the best learning rate and number of layers, using the search strategy defined in configs/optimize_template.yaml
based on the defaults in configs/run_template.yaml
.
- Open your terminal.
- Navigate to the project directory.
- Run the following command:
python main.py --run_config configs/run_template.yaml --optimize_config configs/optimize_template.yaml
--run_config ...
: Provides the base settings.--optimize_config ...
: Tellsmain.py
to perform an optimization study using this additional configuration file.
What happens? main.py
will:
- Read both configuration files (
RunConfig
for base settings,OptimizeConfig
for the search). - Load the data.
- Set up an Optuna study.
- Instead of calling
run
directly once, it will repeatedly call a specialobjective
function. Each call represents one trial (one experiment with a different combination of hyperparameters suggested by Optuna). - Inside the
objective
function, it modifies the baseRunConfig
with the suggested parameters for that trial and then calls therun
function. - Optuna manages this process for the specified number of trials.
- Finally, it prints the best parameters found.
Internal Implementation: How main.py
Works
Let’s peek under the hood to see how main.py
orchestrates these tasks.
High-Level Flow
When you execute python main.py --run_config ... [--optimize_config ...]
:
sequenceDiagram
participant User
participant CLI as Command Line
participant MainPy as main.py
participant Configs as RunConfig / OptimizeConfig
participant DataUtil as util.load_data()
participant Runner as util.run() / Optuna
User->>CLI: Enters `python main.py --run_config ...`
CLI->>+MainPy: Starts main() function with arguments
MainPy->>MainPy: Parse arguments (--run_config, --optimize_config?)
MainPy->>+Configs: Load base RunConfig from YAML
Configs-->>-MainPy: Return base RunConfig object
MainPy->>+DataUtil: Load train/validation data
DataUtil-->>-MainPy: Return datasets
MainPy->>MainPy: Check if --optimize_config was provided
alt Optimization Run
MainPy->>+Configs: Load OptimizeConfig from YAML
Configs-->>-MainPy: Return OptimizeConfig object
MainPy->>+Runner: Setup Optuna study & objective function
Runner->>Runner: Run multiple trials (calling util.run internally)
Runner-->>-MainPy: Optimization complete, return best results
else Single Run
MainPy->>+Runner: Call util.run(base_config, data)
Runner->>Runner: Execute single training process
Runner-->>-MainPy: Training complete, return final loss
end
MainPy->>User: Print results (final loss or best hyperparams)
MainPy-->>-User: Script finishes
This diagram shows main.py
acting as the central coordinator, reading configuration, loading data, and then deciding whether to launch a single run via util.run
or an optimization study via Optuna (which itself uses util.run
for each trial).
Code Walkthrough (main.py
)
Let’s look at the key parts of the main
function in main.py
:
-
Argument Parsing: It uses Python’s
argparse
library to understand the command-line arguments you provide.# main.py (Simplified Snippet 1) import argparse from config import RunConfig, OptimizeConfig # ... other imports: DataLoader, wandb, load_data, run ... def main(): parser = argparse.ArgumentParser() # Expect '--run_config' and make it required parser.add_argument( "--run_config", type=str, required=True, help="Path to the YAML config file" ) # Expect '--optimize_config' but make it optional parser.add_argument( "--optimize_config", type=str, help="Path for optimization config (optional)" ) # Read the arguments provided by the user args = parser.parse_args() # ... rest of the main function ...
This sets up the script to accept
--run_config
(mandatory) and--optimize_config
(optional). -
Loading Base Configuration and Data: It loads the essential
RunConfig
and the datasets.# main.py (Simplified Snippet 2 - inside main()) # Load the main experiment settings from the specified YAML file base_config = RunConfig.from_yaml(args.run_config) # Load the training and validation datasets # (Implementation details in util.py) ds_train, ds_val = load_data() # pyright: ignore # Create DataLoaders for efficient batching during training dl_train = DataLoader(ds_train, batch_size=base_config.batch_size, shuffle=True) dl_val = DataLoader(ds_val, batch_size=base_config.batch_size, shuffle=False)
This uses the
RunConfig.from_yaml
method we discussed in Chapter 1 and a utility functionload_data
to prepare the necessary inputs for training. -
Deciding the Execution Path: This is the core orchestration logic. It checks if the optional
--optimize_config
argument was provided.# main.py (Simplified Snippet 3 - inside main()) # Check if the user provided an optimization config file path if args.optimize_config: # --- OPTIMIZATION PATH --- print("Optimization mode selected.") # Load the optimization settings optimize_config = OptimizeConfig.from_yaml(args.optimize_config) # Define the 'objective' function that Optuna will call for each trial # (Details covered in Chapter 5) def objective(trial, ...): # ... Suggest parameters using trial ... # ... Create a modified RunConfig ... # ... Call the run() function for this trial ... # ... return final_loss ... pass # Simplified for now # Create and run the Optuna study # (Details covered in Chapter 5) study = optimize_config.create_study(...) study.optimize(objective, n_trials=optimize_config.trials) # Print the best results found by Optuna print("Best trial:", study.best_trial) # ... print details ... else: # --- SINGLE RUN PATH --- print("Single run mode selected.") # Directly call the run function with the base configuration and data # The run function handles the actual training loop (using Trainer) final_loss = run(base_config, dl_train, dl_val) print(f"Single run finished. Final validation loss: {final_loss}")
if args.optimize_config:
: If the user provided the optimization config path, it loadsOptimizeConfig
, sets up the Optuna study, defines theobjective
function (which wraps a singlerun
), and starts the optimization process. We’ll explore this branch in detail in Chapter 5: Hyperparameter Optimization (Optuna Integration).else:
: If no optimization config was given, it assumes a standard single run. It calls therun
function (defined inutil.py
) directly, passing the loadedbase_config
and the data loaders. Therun
function encapsulates setting up the model, optimizer, and running the training epochs using theTrainer
class, which is the focus of our next chapter.
-
Starting the Script: The standard Python entry point check ensures the
main()
function runs when you execute the script.# main.py (Simplified Snippet 4 - bottom of the file) # This standard Python construct checks if the script is being run directly # (not imported as a module) if __name__ == "__main__": # If run directly, call the main function to start the process main()
Conclusion
You’ve now seen how main.py
acts as the central orchestrator or project manager for your experiments in pytorch_template
. It’s the starting point that reads your configuration(s), prepares the data, and then intelligently decides whether to launch a single training run or a complex hyperparameter optimization study.
Key takeaways:
main.py
is the script you execute from the command line.- It uses command-line arguments (
--run_config
,--optimize_config
) to understand what you want to do. - It loads the necessary configuration objects (Chapter 1: Configuration Management (
RunConfig
/OptimizeConfig
)) and data. - It delegates the actual work:
- For a single run, it calls the
run
function (which uses theTrainer
). - For optimization, it sets up and runs an Optuna study (Chapter 5: Hyperparameter Optimization (Optuna Integration)).
- For a single run, it calls the
We saw that for a single run, main.py
calls the run
function, which in turn relies on something called a Trainer
. What exactly happens inside that training process? Let’s dive into the heart of the learning process next.
Next Up: Chapter 3: Training Loop (Trainer
)
Generated by AI Codebase Knowledge Builder