Russian Flows¶

As part of the Fundamental Stack Model, we model the gas flows that come from Russia into Europe.

Data sources¶

The data source is Commodity Essentials.
The data is stored in Atlas.
The data that is necessary for the analysis of the russian flows includes:

Gas flows
Capacity bookings
Metadata

Data extraction from Atlas¶

The following tables are read from the data lake:

time series: /process/raw/commodity_essentials/output/eugasseriesbulk
metadata: /process/raw/commodity_essentials/output/eugasmeta/seriesperasse
capacity bookings: /process/raw/commodity_essentials/output/eugascapbooking/capacitybookings

There are multiple series per pipeline, defined by the values of these fields:

Gas Flows

Date
sourceBlend
- The columns sourceBlend and source should be read together. If sourceBlend=true, the series is taking potentially multiple sources into account but the preferred source will be indicated by the column source. If sourceBlend=false, then we are enforcing the use only of the source indicated by the column source.
- Values: True, False
- We take: sourceBlend = True
source
- The columns sourceBlend and source should be read together. If sourceBlend=true, the series is taking potentially multiple sources into account but the preferred source will be indicated by the column source. If sourceBlend=false, then we are enforcing the use only of the source indicated by the column source. There is an order of preference for the sources. The premise is to always take the source closest to the flow. First is TSO (Operator) data, then data from market aggregators (e.g. ENTSOG) and lastly estimates from commodity essentials.
statusBlend
- statusBlend and status should be read together. If statusBlend=true we will take into account multiple statuses, if available, for a given series. In this case the status column will indicated the preferred (most final) of the status available. If statusBlend=false then we will only be reporting the status indicated by the status column.
- Values: True, False
- We take: True
status
- See statusBlend
- Values: phys, nom, renom, alloc
doFilter
- The option doFilter=true is applied by default and it indicates that data quality filters will be applied. The data quality filters remove datapoints on which we have a high confidence there is a quality issue. The filters are rules defined asset by asset based on the properties of the data (e.g. demand >=0) and based on observed behaviors of bad data for any of the data sources
- Values: True, False
- We take: True
doEstimate
- The option doEstimate=true is applied by default and it indicates that some points can be estimated in some cases
- Values: True, False
- We take: True
excludeNoms
- When excludeNoms=true nominations or renominations are not taken into account but only allocations and physical flows.
- Values: True, False
- We take: False

We use these properties together with the list of directions or assetIds (see below) to extract a single seriesId for each pipeline from the series metadata.

Capacity

Book.Status
- Values: Booked, Available
- We take: Book.Status = Booked
assetId
- Unique identifier for a pipeline or other asset
Variable
- Indicates the direction of flow
- Values: forwardFlow, reverseFlow
- We take: Both forwardFlow and reverseFlow
Cap.Type
- Type of capacity booking
- Values: Firm, Interruptible
- We take: Both firm and interruptible
Book.Direction
- Indicates if the booking is for entry or exit capacity
- Values: Entry, Exit, ExitEntry
- We take: All
Matched
- Matched view, where the capacity bookings in the Exit and Entry directions are matched into a consolidated overview. The capacity that is matched in both Exit and Entry directions is indicated with direction=ExitEntry while any remaining unmatched capacity will indicate its Exit or Entry direction and TSO. For example, if 80 GWh/d is booked at the Exit TSO and 100 GWh/d is booked at the Entry TSO then 80 GWh/d will be reported as ExitEntry and the remaining 20 GWh/d will be reported as Entry.
- Values: True/False
- We take: Both
ExactMatch
- Indicates if the capacity entered as ExitEntry is an exact match or not.
- Values: True/False
- We take: Both

Pipeline mappings¶

There is a registered dataset in AzureML that contains:

assetId for each pipeline
assetName for each pipeline
direction
level
seriesId
seriesName
plottingName
logic for building complex flows and aggregates (True/False fields)

This dataset is used for filtering and enriching the data, as well as for generating complex flows.

In order to extract the assetIds for the pipelines, we use the field direction, which contains the direction of the flow for that pipeline (e.g. RU => DE).
The country abbreviations are expressed using the ISO codes for the countries.

The following list of directions is used for extracting the assetIds:

AT => DE
AT => IT
BG => RO
BG => TR
BY => DE
BY => LT
BY => PL
BY => UA
CZ => DE
CZ => PL
CZ => SK
DE => AT
DE => CZ
DE => FR
DE => PL
EE => FI
EE => LV
FR => DE
HU => AT
HU => RO
HU => SK
HU => UA
IT => AT
LT => RU
LT => LV
LV => EE
LV => LT
PL => DE
PL => UA
PL => CZ
PL => PL
RO => BG
RO => HU
RO => UA
RU => DE
RU => EE
RU => FI
RU => LT
RU => TR
RU => UA
SK => AT
SK => CZ
SK => HU
SK => UA
TR => BG
UA => HU
UA => PL
UA => MD
UA => RO
UA => SK

Data Processing¶

The first step is to read from the data lake using Apache Spark.
The data is too big in size to load as a pandas dataframe directly, so some previous filtering is needed.

Basic processing for gas flows¶

The following transformations are done to the gas flow data:

Filter based on date, we take data from 2016 onwards
Filter on seriesId
Select subset of columns
Filter metadata and join with flows
Convert to Pandas DF
Convert units from Gwh/d to mcm/d (Conversion factor used is 0.092)
Convert date format and add date related columns

Basic processing for capacity bookings¶

The following transformations are done to the gas flow data:

Filter based on date, we take data from 2016 onwards
Filter on assetId
Select subset of columns
Group to reduce number of entries (There are multiple entries, for different operators)
Convert to Pandas DF
Convert units from Gwh/d to mcm/d

Logic for building complex flows¶

A complex flow is an aggregate or synthetic flow obtained by combining (sum and substract) flows from several pipelines.
The logic defining which pipelines to add/substract for each complex flow is defined in the mapping dataframe, so any changes to the logic only need to be done once.

Flipping flow signs¶

Some complex flows require substracting the flow from certain pipelines. To do so, we flip the sign of some of the flows and then aggregate the flows based on a list of pipelines.

Note: When we flip the sign of the flow, we are effectively reversing the direction of the flow

These are the complex flows that require flipping the sign:

Oberkappel
- Before: DE => AT
- After: AT => DE
Budince
- Before: SK => UA
- After: UA => SK
Brandov (EUGAL + OPAL)
- Before: DE => CZ
- After: CZ => DE

Afterwards we flip the sign back for the plots, thus restoring the original direction of the flow

Brandov (OPAL + EUGAL)¶

This flow computes the gas flow volume that goes from Nord Stream 1 into the Czech Republic.

Brandov (OPAL + EUGAL) = Brandov, EUGAL + Brandov, OPAL

Yamal¶

We are interested in the amount of the Yamal flow that comes into Europe. There are two ways to compute this:

Yamal = Wysokoje + Kondratky (Entry from Belarus into Poland)
Yamal = Mallnow + PWP

Both methods yield similar gas flow volumes, but we found the first method to be more precise

Russia to EU, excluding Turkey, Baltics and Finland¶

This complex flow is computed to see the total supply from Russia into the EU, excluding Turkey, Finland and the Baltics (Latvia, Lithuania, Estonia)

These pipelines are added to compute the flow volume:

Yamal (Wysokoje + Kondratky)
Nord Stream 1
Strandzha Malkoclar (Russia via Turkstream to Bulgaria)
Beregdaroc net flow
Drozdowicze net flow
Orlovka
Tekovo
Velke Kapusany
Budince (with flipped sign)

Russia to EU through Ukraine¶

This complex flow is computed to see the total gas volume that reaches Europe from Russia, passing through Ukraine

These pipelines are added to compute the flow volume:

Beregdaroc net flow (Taking into account what goes back from HU to UA)
Drozdowicze net flow (Drozdovichi - Hermanowice)
Orlovka (Issacea)
Tekovo (Mediesu Aurit)
Budince (with flipped sign)

Russia to NWE¶

This complex flow is computed to see the total gas volume that reaches NW-Europe from Russia.

We are mainly considering the gas flows that reach Germany from Russia as NWE.

These pipelines are added to compute the flow volume:

Waidhaus (CZ => DE)
Oberkappel (RU => UA => SK => AT => DE) (We flipped the sign)
Mallnow (RU => BY => PL => DE)
Nord Stream 1 (RU => DE)
Brandov (EUGAL+OPAL) (DE => CZ) (We flipped the sign)

Logic for processing capacity bookings¶

In order to compute the total capacity booked for each pipeline, each day, some transformations are needed

Filter entries with
- Book.Direction = ExitEntry
- Matched = True
- ExactMatch = True
For those entries extract: - Date - assetId - Variable - Cap.Type

We want to uniquely identify those entries, to delete other entries when we have an exact match between entry and exit capacity. To do so, we have to do an anti-join, which is not natively supported in pandas.
- Extract a sub-df with the filtered entries and columns
- Do an outer join with the original df (with indicator=True)
- Eliminate all entries that have _merge=="both"
- Concat this df with the first one
- Group by Date, assetId, assetName, Variable, Book.Direction (We no longer need to distinguish between firm and interruptible capacity)
Split ExitEntry inot Exit and Entry
- This way, if there is any entry with Exitentry and ExactMatch = False, you can rebuild the total for exit & entry
- Pivot and pick columns
  - Pivot in order to have: Date, assetId, assetName, Variable, ExitEntry, Entry, Exit
  - Create new column to choose the cap. booking value to take: Entry or Exit (Take the highest one)
  - Select subset of columns
- Pivot again to get a single row with forward and reverse flow
Extract a single value per pipeline, per day and compute aggregates
- Join with mapping df
- Flip sign where necessary
- Keep both forward and reverse flows capacity bookings
- Build aggregates (complex flows) using the same logic as with the flows

Output Data¶

As a result of the processing, several datasets are stored in a blob storage

Historical flows and capacity bookings¶

This datasets contains historical data for capacity bookings and flow volumes.
One entry per pipeline, per day. This dataset can be used for training forecasting models.

Summary dataframes¶

These datasets contain a summary of the flows for the main pipelines:

Current year YTD: Total flow since the beginning of the current year until today. Quantities in bcm
Last year YTD: Total flow since the beginning of the last year until today's date last year. Quantities in bcm
Y/Y chg: Change between previous year and this year. Quantities in bcm
Last 30 days avg for current year: Average flow during the last 30 days (current year). Quantities in mcm
Last 30 days avg for last year: Average flow during the last 30 days (last year). Quantities in mcm
M/M chg: Change betwen last year's last 30 days and this year's last 30 days
Weekly avg: Average flow during the last week. Quantities in mcm
Last week's avg: Average flow during the previous week. Quantities in mcm
w/w chg: Change between the previous week and this last week. Quantities in mcm

Gas flows in the last 5 days¶

These datasets contain an overview of the gas flow volumes in the pipelines during the last 5 days.