Trino - Historical Data

Besides our public API, we provide free access to our full dataset through an SQL-like query interface using Trino and Minio. This service is available to university-affiliated researchers, governmental organizations, and aviation authorities for aviation-related research and incident investigations. Private or commercial entities must contact us for a license.

Getting Started

In 2024, we transitioned to a new backend and updated our access method. Follow these steps to connect:

1. Register an Account

To access historical data:

  1. Register an OpenSky account here: OpenSky Register.
  2. Verify your email.
  3. Go to My OpenSky - Request Request Data Access and fill out the form.

2. Connect to Trino

Once registered and granted access, you can connect to the database using Trino clients or Python libraries like traffic and pyOpenSky. These libraries support Trino without requiring code changes.

3. Connecting via CLI

Use the following command in the terminal to connect to Trino:

trino --user=USER --password --server=https://trino.opensky-network.org --external-authentication --catalog "minio" --schema "osky"

Important: Your username is stored in lowercase. Ensure you use lowercase when connecting via CLI, Python, or any programmatic access method.

Exploring the Database

To get a list of all available tables, run the following command:

show tables;

The following tables are available:

If you are familiar with SSR Mode S downlink communications, the table names should ring a bell. We won’t cover every single table in this tutorial, as it’s intended to serve as a quick guide on how to use this interface and the data. Those of you who are interested in the low-level tables (e.g., in rollcall replies) should already be familiar with their meanings. For more information and a comprehensible introduction to SSR Mode S, we recommend reading the SSR Mode S section of Christian Wolff’s radar tutorial. However, depending on your intentions, you might not need this background knowledge since using the state vectors is pretty straightforward, as you’ll soon see, and doesn’t require a detailed understanding of the underlying technology.

The first important thing to note is the ending of the tables. They indicate the version of the batch layer implementation that generated the data. As of this writing, we are running the 4th implementation, so you can consider everything else deprecated and buggy—it might be removed in the future.

From our experience, most of you are interested in what we call state vectors. These are essentially summaries of all the other tables containing the most common tracking information. We’ll cover this table as an example of how to explore the database, and you’ll be able to apply the same methodology to the other tables as well. Let’s first take a closer look at the table structure:

describe state_vectors_data4;

The state_vectors_data4 table contains the following columns:

Understanding the Data

Units

Units are unified across all tables. If you find columns with the same content (e.g., timestamps) in other tables, they will have the same unit. The data follows the metric system for consistency and ease of understanding:

Exploring State Vector Data

Let's explore an example state vector by running the following query:

SELECT * FROM state_vectors_data4 WHERE hour=1480759200 ORDER BY rand() LIMIT 1;

The query returns a randomly selected state vector. Below is an example of the results:

+------------+--------+-------------------+--------------------+-------------------+-------------------+----------+----------+----------+-------+-------+--------+-------------------+-------------------+----------------+----------------+------------+
| time       | icao24 | lat               | lon                | velocity          | heading           | vertrate | callsign | onground | alert | spi   | squawk | baroaltitude      | geoaltitude       | lastposupdate  | lastcontact    | hour       |
+------------+--------+-------------------+--------------------+-------------------+-------------------+----------+----------+----------+-------+-------+--------+-------------------+-------------------+----------------+----------------+------------+
| 1480760792 | a0d724 | 37.89463883739407 | -88.93331113068955 | 190.85            | 265.83            | 0        | UPS858   | false    | false | false | 7775   | 9144              | 9342.12           | 1480760704.36  | 1480760709.99  | 1480759200 |
+------------+--------+-------------------+--------------------+-------------------+-------------------+----------+----------+----------+-------+-------+--------+-------------------+-------------------+----------------+----------------+------------+

Column Breakdown

Timestamps & Data Retention

The time column represents the state vector timestamp. OpenSky retains state vectors even after an aircraft leaves coverage for a maximum of 300 seconds. To filter out older data, use the condition:

WHERE time - lastcontact <= 15

Further Analysis

You can explore additional tables like flights_data4 for structured flight tracking, or operational_status_data4 for ADS-B equipment and transponder capabilities. For deeper insights into aircraft trajectory extraction, consider using traffic and pyOpenSky.

The Other Tables

Many tables in the database contain the columns mintime, maxtime, and msgcount. These summarize messages received multiple times.

If you want to learn more about specific tables, refer to the official Mode S and ADS-B specifications:

Example Query: All-Call Replies

Below is an example query retrieving a row from the allcall_replies_data4 table:

SELECT * FROM allcall_replies_data4 LIMIT 1;
+----------------+----------------+----------------+----------+--------+--------------+--------------+------------+
| rawmsg         | mintime        | maxtime        | msgcount | icao24 | capabilities | interrogator | hour       |
+----------------+----------------+----------------+----------+--------+--------------+--------------+------------+
| 5d400fbab21825 | 1478293200.038 | 1478293636.342 | 36       | 400fba | 5            | 00003a       | 1478293200 |
+----------------+----------------+----------------+----------+--------+--------------+--------------+------------+

This means that on Friday, 04-Nov-16, between 21:00:00 and 21:07:16 UTC, OpenSky received 36 replies from the aircraft with transponder ID 400fba to all-call interrogations from an interrogator with ID 00003a. The aircraft reported a capability level of 5, meaning it was airborne with a transponder level of 2 or above (as defined in ICAO Annex 10 Vol. 4).

Retention Periods

The state_vectors_data4 table has unlimited retention, but some other tables may only retain data for about a year.

Flight-Level Data

The flights_data4 table provides structured flight data and serves as the basis for the OpenSky COVID-19 dataset. This table is useful for high-level flight tracking and research applications.

Tools & Performance Considerations

Several tools can help you interact with OpenSky's historical data. Python libraries such as traffic and pyOpenSky allow seamless extraction of trajectories and other key insights. Additionally, Trino's command-line tools can be used for querying the database efficiently.

Performance Guidelines

We enforce a queuing system: Each user is limited to two concurrent queries and two queued queries. Global limits also apply, so if you encounter an error regarding the queue, please try again later.

Killing Queries

If your Python process is stuck or running an unintended query, visit Trino's UI, filter by your username, and manually terminate the unwanted query.

Useful Query Examples

Here are some common queries you might find useful:

Latest Position per Aircraft (Every Minute)

SELECT *
FROM state_vectors_data4 v
JOIN (
    SELECT QUOTIENT(time, 60) AS minute, MAX(time) AS recent, icao24
    FROM state_vectors_data4
    WHERE hour=1480762800
    GROUP BY icao24, minute
) AS m
ON v.icao24=m.icao24 AND v.time=m.recent
WHERE v.hour=1480762800;

Last Position Seen by a Specific Receiver

SELECT *
FROM state_vectors_data4 v
JOIN (
    SELECT MAX(time) AS recent, COUNT(*) AS cnt, icao24
    FROM state_vectors_data4, state_vectors_data4.serials s
    WHERE hour=1480762800
    AND s.ITEM=1344390019
    GROUP BY icao24
) AS m
ON v.icao24=m.icao24 AND v.time=m.recent
WHERE cnt>30 AND v.hour=1480762800;

Note: The filter AND cnt>30 helps remove noise from erroneous receiver associations. You may further refine results by adding latitude and longitude constraints.

Count Aircraft Entering Frankfurt Airspace

SELECT COUNT(DISTINCT icao24)
FROM state_vectors_data4
WHERE lat BETWEEN 49.98 AND 50.07
AND lon BETWEEN 8.48 AND 8.62
AND hour=1493892000;

Conclusion

You are now ready to explore OpenSky's historical dataset! We encourage you to analyze trends, track aircraft, and contribute valuable insights. If you find anything noteworthy or have suggestions for improving this guide, please reach out to us.

Notification of Limited Historical Data

Due to technical issues in 2023, some historical data has been lost. Below are the affected periods (all times in UTC). Please exclude these periods from your research to ensure data accuracy.

Credits

We appreciate the contributions of the following individuals who helped improve this guide: