Storage Optimizer¶
The Storage Optimizer belongs to the FSM models, but since it's a an optimization model rather than an ML one, we are talking about a completely specific model class that deserves a dedicated section in the documentation.
Orchestration¶
Storage Optimizer model runs daily, orchestrated by an AML scheduled job. The creation of the job schedule is performed within the Build & Deploy GHA (link here for the Whale Squad MLOps documentation), that runs automatically whenever a merge operation is performed against the dev/main branches. To manually schedule/modify it, it is necessary to run the script runner.py with the following parameters configuration from the config_latest.toml file:
- models_scope should contain "storage_optimizer" as a value
- storage_optimizer_also = true
- storage_optimizer_schedule_also = true
- submit_storage_optimizer = true
- In the section [storage_optimizer.schedule]:
- action = "create" (set it to "delete" to erase the existing schedule, if any)
- do_action = true
To modify any key parameters of the Storage Optimizer execution please refer to the following sections of the config_latest.toml:
- [storage_optimizer.inputs]
- [storage_optimizer.schedule.prediction]
Environment¶
The environment is registered/update in AML automatically by the Build & Deploy GHA, namely as storage-optimizer:<latest_version>. Files for environment definition can be found at .github/scripts/assets/environments/storage-optimizer-env of the ngtt-fsm repo.
Component & Modelling¶
The data processing and optimization steps of the Storage Optimizer are embedded into a single component namely storage_optimizer. The component is registered/updated in AML automatically by the Build & Deploy GHA. Files for component definition can be found at storage-optimizerof the ngtt-fsm repo.
Further details of the component (inputs, parameters, outputs) can be found in the component.yml definition file.
Under the hood of the component, the model is a Linear Optimization model that finds the optimal intrinsic and extrinsic profiles of a storage site to maximize storage Net Present Value. It considers a variety of input data such as price forward curves, volatility and storage characteristics. The extrinsic profile is powered by Least Squares Monte Carlo simulation to model price as stochastic process based on its volatility.
The model was developed by CMDTY, more details about the PyPi package and the underlying modelling technique can be found here.
The Optimization Problem¶
Maximize: $$ \sum_{t=1}^{T} (P_{t}^{inj} * I_{t} - P_{t}^{wd} * W_{t}) $$ Subject to: $$ V_{min, t} \leq V_{t} \leq V_{max, t} $$ $$ V_{t} = V_{t-1} + I_{t} - W_{t} $$ $$ 0 \leq I_{t} \leq I_{max} $$ $$ 0 \leq W_{t} \leq W_{max} $$
Where: - \(P_{t}^{inj}\) and \(P_{t}^{wd}\) are the injection and withdrawal prices at time \(t\), - \(I_{t}\) and \(W_{t}\) are the injection and withdrawal volumes at time \(t\), - \(V_{t}\) is the volume of gas in storage at time \(t\), - \(V_{min, t}\) and \(V_{max, t}\) are the minimum and maximum storage volumes allowed at time \(t\), - \(I_{max}\) and \(W_{max}\) are the maximum injection and withdrawal rates.
Ratchets (Envelopes) and Tunnels¶
Any storage gas site is physically characterized by ratchets. Ratchets are the minimum and maximum injection/withdrawal rates that a site can reach according to a specific inventory level. Typically, when the inventory level increases, the injection rate starts to decrease while the withdrawal rate increases, and vice-versa. The optimizer uses these ratchets to ensure that the storage site is operated within its physical limits. Below we report a typical example of ratchets for a storage site:

Ratchets were built manually for each storage by finding information on online resources and reports. This data is stored in the envelopes_manual_v2.xlsx spreadsheet. This excel file is used as input for a pipeline that processes the data and stores it in the storage_envelopes_manual dataset. This pipeline updates the "site" sheet according to latest storage sites capacity reported in commodity essentials, and dumps it as a parquet. In summary, to change any ratchet, it is necessary to update the excel file and run the pipeline.
Typically there are 3-7 data points for defining ratchets of each storage. Since this number is variable, then ratchets are harmonized during the storage optimizer run, by getting a value for each fill level between 0 and 100% of the storage capacity, with a step of 1%. This is done by linearly interpolating the ratchets data points.
Tunnels shape the inventory trajectory that a storage site should respect at specific points in time across the optimization horizon. Those points are also named "pivot dates". For each pivot date, there is a minimum inventory level and a maximum inventory level (tipically is always set to 100%) is provided. Thus, tunnels parameters are considered as constraints in the optimization problem.
Tunnels are determined by the minimum fill level for each EU country. This data is stored in the storage_country_filling_levels dataset. This is a static file reporting the fill levels for each EU country across some key dates. Thus, to modify tunnels, a manual modification to this file is needed.
Volatility¶
In the extrinsic profile optimization, volatility is used to model the price as a stochastic process. The optimizer uses the historical volatility of the price to calibrate the volatility parameters, and a three-factor model is used to simulate the price behavior. The three-factor model is a differential equation model (more details can be found here). To provide a numerical method to approximate this model, a Least Sqaures Monte Carlo method is adopted. The LSMC method uses basis functions to represent the price behavior and simulate the price paths. The selected basis function is the following
$$ 1 + x_{st} + x_{sw} + x_{lt} + s + s^2 $$ Where: - \(1\) is the constant term in your basis function. It allows the function to move up and down along the y-axis. - \(x_{st}\) represents the spot factor. It’s a variable that captures the current price of the commodity (in this case, gas). The spot factor is one of the factors that affect the spot price and the forward mean level. - \(x_{sw}\) represents the winter-summer factor. This factor captures the seasonal dependency of gas prices. A positive return moves winter prices up, summer prices down, and leaves prices in between less affected. - \(x_{lt}\) represents the long-term factor. This factor affects all prices on the curve equally and its impact is independent of where we are on the forward curve. - \(s\) represents the gas spot price.
Volatility is calibrated using the historical volatility of the price. The optimizer uses the historical volatility of the price to calibrate the volatility parameters. Spot volatility is calculated by computing the standard deviation of the deseasonalized log returns for the last lookback_days, annualized them by number of trading days. Long term volatility is calculated by computing the standard deviation of the year-ahead log returns for the last lookback_days, annualized them by number of trading days. Seasonal volatility is calculated by computing the standard deviation of the seasonal Q1-Q3 log returns for the last lookback_days, annualized them by number of trading days.
The Combined Profile¶
The optimizer combines the intrinsic and extrinsic profiles to create the combined profile. The combined profile is the optimal linear combination of the intrinsic and extrinsic profiles that minimizes the MAPE between the actual storage and the storage predicted by the combined profile. To find the optimal weights, the optimizer uses a grid search algorithm that tests all possible combinations of weights and selects the one that minimizes the MAPE across year 2021 and 2022.
Pipeline¶
Storage Optimizer execution is performed through a single step AML Pipeline Job as depicted below:
.
Inputs:
- storage_envelopes_manual: ratchets data for each storage site
- historical_daily: commodity essential data from which actual storage of each site is retrieved
- storage_country_filling_levels: data reporting the minimum fill level for each EU country
- storage_optimal_weighted_profiles: optimal combination split (xx% and 1-xx%) between intrinsic and extrinsic profile for each storage site used to output the combined profile
- prices_storage_volatility: price forward curve data used to calibrate volatility parameters
- gdm_fwd_curves_from_contracts: price forward curve data from GDM
- storage_totem_curves: price forward curve data from Totem
- gate_fwd_curves_daily: price forward curve data from Gate (currently in use)
Manual evaluation of model performance¶
The Storage Optimizer model is mainly used as a forecasting model, assuming that storage sites will be operated according to the strategy that maximizes their expected return (NPV).
When choosing the optimal parameter of the optimizer (e.g. volatility, % split between intrinsic and extrinsic), backtesting experiments have been executed to quantitatively evaluate the performances of the model.
Model metric score(s)¶
Below we reports the main metrics that have been used to evaluate different optimization configurations and select the best one.
2021 and 2022 years were used as test sets.
| Metric | Full metric name | Expression | Comments |
|---|---|---|---|
| MAE | Mean Average Error | \(\frac{1}{n}\sum_{i=1}^n\left\|y_i-\hat{y}_i\right\|\) | Easily interpretable, but does not reveal the proportional scale of the error |
| MAPE | Mean Average Percentage Error | \(\frac{100}{n} \sum_{i=1}^n\left\|\frac{y_t-\hat{y}_i}{y_i}\right\|\) | Ideal when comparing performance between different data sets, but values might explode if test set contains values close to zero. |