Skip to content

Russian Flows

As part of the Fundamental Stack Model, we model the gas flows that come from Russia into Europe.

Data sources

The data source is Commodity Essentials.
The data is stored in Atlas.
The data that is necessary for the analysis of the russian flows includes:

  • Gas flows
  • Capacity bookings
  • Metadata

Data extraction from Atlas

The following tables are read from the data lake:

  • time series: /process/raw/commodity_essentials/output/eugasseriesbulk
  • metadata: /process/raw/commodity_essentials/output/eugasmeta/seriesperasse
  • capacity bookings: /process/raw/commodity_essentials/output/eugascapbooking/capacitybookings

There are multiple series per pipeline, defined by the values of these fields:

Gas Flows

  • Date
  • sourceBlend
    • The columns sourceBlend and source should be read together. If sourceBlend=true, the series is taking potentially multiple sources into account but the preferred source will be indicated by the column source. If sourceBlend=false, then we are enforcing the use only of the source indicated by the column source.
    • Values: True, False
    • We take: sourceBlend = True
  • source
    • The columns sourceBlend and source should be read together. If sourceBlend=true, the series is taking potentially multiple sources into account but the preferred source will be indicated by the column source. If sourceBlend=false, then we are enforcing the use only of the source indicated by the column source. There is an order of preference for the sources. The premise is to always take the source closest to the flow. First is TSO (Operator) data, then data from market aggregators (e.g. ENTSOG) and lastly estimates from commodity essentials.
  • statusBlend
    • statusBlend and status should be read together. If statusBlend=true we will take into account multiple statuses, if available, for a given series. In this case the status column will indicated the preferred (most final) of the status available. If statusBlend=false then we will only be reporting the status indicated by the status column.
    • Values: True, False
    • We take: True
  • status
    • See statusBlend
    • Values: phys, nom, renom, alloc
  • doFilter
    • The option doFilter=true is applied by default and it indicates that data quality filters will be applied. The data quality filters remove datapoints on which we have a high confidence there is a quality issue. The filters are rules defined asset by asset based on the properties of the data (e.g. demand >=0) and based on observed behaviors of bad data for any of the data sources
    • Values: True, False
    • We take: True
  • doEstimate
    • The option doEstimate=true is applied by default and it indicates that some points can be estimated in some cases
    • Values: True, False
    • We take: True
  • excludeNoms
    • When excludeNoms=true nominations or renominations are not taken into account but only allocations and physical flows.
    • Values: True, False
    • We take: False

We use these properties together with the list of directions or assetIds (see below) to extract a single seriesId for each pipeline from the series metadata.

Capacity

  • Book.Status
    • Values: Booked, Available
    • We take: Book.Status = Booked
  • assetId
    • Unique identifier for a pipeline or other asset
  • Variable
    • Indicates the direction of flow
    • Values: forwardFlow, reverseFlow
    • We take: Both forwardFlow and reverseFlow
  • Cap.Type
    • Type of capacity booking
    • Values: Firm, Interruptible
    • We take: Both firm and interruptible
  • Book.Direction
    • Indicates if the booking is for entry or exit capacity
    • Values: Entry, Exit, ExitEntry
    • We take: All
  • Matched
    • Matched view, where the capacity bookings in the Exit and Entry directions are matched into a consolidated overview. The capacity that is matched in both Exit and Entry directions is indicated with direction=ExitEntry while any remaining unmatched capacity will indicate its Exit or Entry direction and TSO. For example, if 80 GWh/d is booked at the Exit TSO and 100 GWh/d is booked at the Entry TSO then 80 GWh/d will be reported as ExitEntry and the remaining 20 GWh/d will be reported as Entry.
    • Values: True/False
    • We take: Both
  • ExactMatch
    • Indicates if the capacity entered as ExitEntry is an exact match or not.
    • Values: True/False
    • We take: Both

Pipeline mappings

There is a registered dataset in AzureML that contains:

  • assetId for each pipeline
  • assetName for each pipeline
  • direction
  • level
  • seriesId
  • seriesName
  • plottingName
  • logic for building complex flows and aggregates (True/False fields)

This dataset is used for filtering and enriching the data, as well as for generating complex flows.

In order to extract the assetIds for the pipelines, we use the field direction, which contains the direction of the flow for that pipeline (e.g. RU => DE).
The country abbreviations are expressed using the ISO codes for the countries.

The following list of directions is used for extracting the assetIds:

  • AT => DE
  • AT => IT
  • BG => RO
  • BG => TR
  • BY => DE
  • BY => LT
  • BY => PL
  • BY => UA
  • CZ => DE
  • CZ => PL
  • CZ => SK
  • DE => AT
  • DE => CZ
  • DE => FR
  • DE => PL
  • EE => FI
  • EE => LV
  • FR => DE
  • HU => AT
  • HU => RO
  • HU => SK
  • HU => UA
  • IT => AT
  • LT => RU
  • LT => LV
  • LV => EE
  • LV => LT
  • PL => DE
  • PL => UA
  • PL => CZ
  • PL => PL
  • RO => BG
  • RO => HU
  • RO => UA
  • RU => DE
  • RU => EE
  • RU => FI
  • RU => LT
  • RU => TR
  • RU => UA
  • SK => AT
  • SK => CZ
  • SK => HU
  • SK => UA
  • TR => BG
  • UA => HU
  • UA => PL
  • UA => MD
  • UA => RO
  • UA => SK

Data Processing

The first step is to read from the data lake using Apache Spark.
The data is too big in size to load as a pandas dataframe directly, so some previous filtering is needed.

Basic processing for gas flows

The following transformations are done to the gas flow data:

  • Filter based on date, we take data from 2016 onwards
  • Filter on seriesId
  • Select subset of columns
  • Filter metadata and join with flows
  • Convert to Pandas DF
  • Convert units from Gwh/d to mcm/d (Conversion factor used is 0.092)
  • Convert date format and add date related columns

Basic processing for capacity bookings

The following transformations are done to the gas flow data:

  • Filter based on date, we take data from 2016 onwards
  • Filter on assetId
  • Select subset of columns
  • Group to reduce number of entries (There are multiple entries, for different operators)
  • Convert to Pandas DF
  • Convert units from Gwh/d to mcm/d

Logic for building complex flows

A complex flow is an aggregate or synthetic flow obtained by combining (sum and substract) flows from several pipelines.
The logic defining which pipelines to add/substract for each complex flow is defined in the mapping dataframe, so any changes to the logic only need to be done once.

Flipping flow signs

Some complex flows require substracting the flow from certain pipelines. To do so, we flip the sign of some of the flows and then aggregate the flows based on a list of pipelines.

Note: When we flip the sign of the flow, we are effectively reversing the direction of the flow

These are the complex flows that require flipping the sign:

  • Oberkappel
    • Before: DE => AT
    • After: AT => DE
  • Budince
    • Before: SK => UA
    • After: UA => SK
  • Brandov (EUGAL + OPAL)
    • Before: DE => CZ
    • After: CZ => DE

Afterwards we flip the sign back for the plots, thus restoring the original direction of the flow

Brandov (OPAL + EUGAL)

This flow computes the gas flow volume that goes from Nord Stream 1 into the Czech Republic.

  • Brandov (OPAL + EUGAL) = Brandov, EUGAL + Brandov, OPAL

Yamal

We are interested in the amount of the Yamal flow that comes into Europe. There are two ways to compute this:

  1. Yamal = Wysokoje + Kondratky (Entry from Belarus into Poland)
  2. Yamal = Mallnow + PWP

Both methods yield similar gas flow volumes, but we found the first method to be more precise

Russia to EU, excluding Turkey, Baltics and Finland

This complex flow is computed to see the total supply from Russia into the EU, excluding Turkey, Finland and the Baltics (Latvia, Lithuania, Estonia)

These pipelines are added to compute the flow volume:

  • Yamal (Wysokoje + Kondratky)
  • Nord Stream 1
  • Strandzha Malkoclar (Russia via Turkstream to Bulgaria)
  • Beregdaroc net flow
  • Drozdowicze net flow
  • Orlovka
  • Tekovo
  • Velke Kapusany
  • Budince (with flipped sign)

Russia to EU through Ukraine

This complex flow is computed to see the total gas volume that reaches Europe from Russia, passing through Ukraine

These pipelines are added to compute the flow volume:

  • Beregdaroc net flow (Taking into account what goes back from HU to UA)
  • Drozdowicze net flow (Drozdovichi - Hermanowice)
  • Orlovka (Issacea)
  • Tekovo (Mediesu Aurit)
  • Budince (with flipped sign)

Russia to NWE

This complex flow is computed to see the total gas volume that reaches NW-Europe from Russia.

We are mainly considering the gas flows that reach Germany from Russia as NWE.

These pipelines are added to compute the flow volume:

  • Waidhaus (CZ => DE)
  • Oberkappel (RU => UA => SK => AT => DE) (We flipped the sign)
  • Mallnow (RU => BY => PL => DE)
  • Nord Stream 1 (RU => DE)
  • Brandov (EUGAL+OPAL) (DE => CZ) (We flipped the sign)

Logic for processing capacity bookings

In order to compute the total capacity booked for each pipeline, each day, some transformations are needed

  1. Filter entries with

    • Book.Direction = ExitEntry
    • Matched = True
    • ExactMatch = True

    For those entries extract: - Date - assetId - Variable - Cap.Type

    We want to uniquely identify those entries, to delete other entries when we have an exact match between entry and exit capacity. To do so, we have to do an anti-join, which is not natively supported in pandas.

    • Extract a sub-df with the filtered entries and columns
    • Do an outer join with the original df (with indicator=True)
    • Eliminate all entries that have _merge=="both"
    • Concat this df with the first one
    • Group by Date, assetId, assetName, Variable, Book.Direction (We no longer need to distinguish between firm and interruptible capacity)
  2. Split ExitEntry inot Exit and Entry

    • This way, if there is any entry with Exitentry and ExactMatch = False, you can rebuild the total for exit & entry

    • Pivot and pick columns

      • Pivot in order to have: Date, assetId, assetName, Variable, ExitEntry, Entry, Exit
      • Create new column to choose the cap. booking value to take: Entry or Exit (Take the highest one)
      • Select subset of columns
    • Pivot again to get a single row with forward and reverse flow

  3. Extract a single value per pipeline, per day and compute aggregates

    • Join with mapping df
    • Flip sign where necessary
    • Keep both forward and reverse flows capacity bookings
    • Build aggregates (complex flows) using the same logic as with the flows

Output Data

As a result of the processing, several datasets are stored in a blob storage

Historical flows and capacity bookings

This datasets contains historical data for capacity bookings and flow volumes.
One entry per pipeline, per day. This dataset can be used for training forecasting models.

Summary dataframes

These datasets contain a summary of the flows for the main pipelines:

  • Current year YTD: Total flow since the beginning of the current year until today. Quantities in bcm
  • Last year YTD: Total flow since the beginning of the last year until today's date last year. Quantities in bcm
  • Y/Y chg: Change between previous year and this year. Quantities in bcm
  • Last 30 days avg for current year: Average flow during the last 30 days (current year). Quantities in mcm
  • Last 30 days avg for last year: Average flow during the last 30 days (last year). Quantities in mcm
  • M/M chg: Change betwen last year's last 30 days and this year's last 30 days
  • Weekly avg: Average flow during the last week. Quantities in mcm
  • Last week's avg: Average flow during the previous week. Quantities in mcm
  • w/w chg: Change between the previous week and this last week. Quantities in mcm

Gas flows in the last 5 days

These datasets contain an overview of the gas flow volumes in the pipelines during the last 5 days.