Scientific Datasets

This page provides an overview of publicly available datasets derived from OpenSky data. These datasets have been created using our Trino historical data interface and live API.

Available Datasets

1. Weekly 24 Hours of State Vector Data

Curator: OpenSky

These files are snapshot of several years' worth of Monday's complete state vector data collected by OpenSky. This data is available in 10 second update intervals providing time, icao24, lat/lon, velocity, heading, vertrate, callsign, onground, alert/spi, squawk, baro/geoaltitude, lastposupdate, lastcontact.

State Vectors are our abstraction for tracking information. We offer three different formats: CSV, Avro and JSON. There is a file in either format for every hour. Be aware that CSV and JSON in particular are much larger after decompression.

Sources:

2. OpenSky Raw Data

Curator: OpenSky

Contrary to other large aggregators, OpenSky collects and stores physical layer data with each message. This is a sample of such raw data.

These data sets contain samples of raw data as received by the OpenSky Network. As some users prefer to conceal the exact location of their receivers, the data set has been anonymized in the sense that their location has been removed from the data.

Sources:

3. The LocaRDS Dataset

Curator: OpenSky

With this work, we attempt to improve the current state of the art in localization research and put it on a solid scientific grounding for the future. Concretely, LocaRDS is an open reference dataset of real-world crowdsourced flight data from the OpenSky Network featuring more than 222 million measurements from over 50 million transmissions recorded by 323 sensors. LocaRDS can be used to test, analyze and directly compare different localization techniques. It is intended to answer in particular the open question of the aircraft localization problem in crowdsourced sensor networks.

Sources:

Scientific Paper:

LocaRDS: A Localization Reference Data Set

Matthias Schäfer, Martin Strohmeier, Mauro Leonardi, Vincent Lenders
Sensors, 21(16), p.5516

4. The COVID-19 Flight Dataset

Curator: OpenSky

The data in this dataset is derived, cleaned and enriched from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 5500 receivers since 1 January 2019. We stopped updating the dataset after December 2022.

Sources:

Scientific Paper:

Crowdsourced Air Traffic Data from the OpenSky Network 2019–20

Martin Strohmeier, Xavier Olive, Jannis Lübbe, Matthias Schäfer, Vincent Lenders
In Earth System Science Data, 2021

5. OpenSky's Aircraft Metadata Database

Curator: OpenSky

Besides the ADS-B tracking data it is necessary for most reseach to have metadata about the tracked aircraft. OpenSky's aircraft database aggregates official and unofficial sources in order to provide this metadata. It is crowdsourced, too, so you can add missing, incorrect, or outdated information directly via the web interface. The database as a whole can also be downloaded in .csv format. Recently, we have started to keep monthly snapshots so the metadata is preserved how it was in a particular month.

Note: Due to some issues with the automatic dumps at the moment, there are complete manual dumps provided as an alternative.

Sources:

6. Reference Datasets for In-Flight Emergency Situations

Curator: Xavier Olive (ONERA)

The data in this dataset is derived and cleaned from the full OpenSky dataset in order to illustrate in-flight emergency situations triggering the 7700 transponder code. It spans flights seen by the network's more than 2500 members between 1 January 2018 and 29 January 2020. It is principally sourced from our Alerts page.

Sources:

Scientific Paper:

OpenSky Report 2020: Analysing In-Flight Emergencies Using Big Data

Xavier Olive, Axel Tanner, Martin Strohmeier, Matthias Schäfer, Metin Feridun, Allan Tart, Ivan Martinovic, Vincent Lenders
In 2020 IEEE/AIAA 39th Digital Avionics Systems Conference (DASC), October 2020

7. Climbing Aircraft Dataset

Curator: Richard Alligier (ENAC)

This dataset contains the climbing segments of the year 2017 detected by OpenSky. The 11 most frequent aircraft types are studied. The obtained data set contains millions of climbing segments from all over the world. The climbing segments are not filtered according to their altitude. Predictive models returning the missing parameters are learned from this data set, using a Machine Learning method. The trained models are tested on the two last months of the year and compared with a baseline method (BADA used with the mean parameters computed on the first ten months). Compared with this baseline, the Machine Learning approach reduce the RMSE on the altitude by 48% on average on a 10 min horizon prediction. The RMSE on the speed is reduced by 25% on average. The trajectory prediction is also improved for small climbing segments. Using only information available before the considered aircraft take-off, the Machine Learning method can predict the unknown parameters, reducing the RMSE on the altitude by 25% on average.

The data set and the Machine Learning code are publicly available.

Sources:

Scientific Paper:

Learning Aircraft Operational Factors to Improve Aircraft Climb Prediction: A Large Scale Multi-Airport Study

R. Alligier, D. Gianazza
In Transportation Research Part C: Emerging Technologies 96, 72-95

8. Database for World Aircraft's Common GICB Capabilities

Curator: Junzi Sun (TU Delft)

The common usage Ground-initiated Comm-B (GICB) capabilities refer to a set of Mode S transponder downlink capabilities that are often interested by air traffic controllers. They include, for example, ADS-B messages, Mode S enhanced surveillance replies, and Mode S meteorological reports.

This dataset contains the GICB capabilities of 50,000 aircraft generated from the global Mode S data obtained by the OpenSky Network. It is published to support the scientific paper for OpenSky 2020 Symposium: Mode S Transponder Comm-B Capabilities in Current Operational Aircraft. The full process of data gathering and decoding is documented in the paper.

Scientific Paper:

Mode S Transponder Comm-B Capabilities in Current Operational Aircraft

Junzi Sun, Huy Vû, Xavier Olive, Jacco M. Hoekstra
In Proceedings of the 8th OpenSky Symposium 2020, p. 4. 2020

9. The OpenSky ADS-C Dataset

Curator: OpenSky Network

ADS-C is an advanced surveillance system that utilizes an aircraft's onboard systems to automatically transmit crucial information, including position, altitude, speed, navigation intentions, and meteorological data. Different from ADS-B, ADS-C transmits contract data via satellite to specific Air Traffic Services Units (ATSU) or Aeronautical Operational Control (AOC) facilities, contributing to a more comprehensive and global approach to air traffic monitoring.

In the paper, we describe the background of ADS-C and implement a resource-intensive data collectio. We find that ADS-C can be an excellent complementary data source for researchers working with aviation data. The original dataset contains 227,126 messages collected over 4 months.

Scientific Paper:

A First Look at Exploiting the Automatic Dependent Surveillance-Contract Protocol for Open Aviation Research

Xapelli, M., Lüscher, T., Tresoldi, G., Strohmeier, M., & Lenders, V.
In Proceedings of the 11th OpenSky Symposium, 2023