Data Sets created from OpenSky Data
On this page, we will collect and briefly introduce the datasets created from OpenSky data that are available. Principally, these have all been created either via our historical data Impala shell (access request required), or our live API.
This list is work in progress, we will be adding more tailored datasets in the future as our researchers develop them. Unless noted otherwise in the respective repository, our Terms of Use / License.txt apply.
Further reading for a collection of datasets beyond (and including) the OpenSky ones below: https://atmdata.github.io/
All datasets:
1. Weekly 24 hours of State Vector Data
2. OpenSky Raw Data
3. The LocaRDS Dataset
4. The Covid-19 Flight Dataset
5. OpenSky's Aircraft Metadata Database
6. Reference datasets for in-flight emergency situations
7. Climbing Aircraft Dataset
8. Database for world aircraft's common GICB capabilities
9. The OpenSky ADS-C dataset
1. Weekly 24 hours of State Vector Data
Curator:
OpenSky
Description:
Every Tuesday morning, we provide a full snapshot of the previous Monday's complete state vector data collected by OpenSky, these are generally available for about the previous 6 month. This data is very detailed, in one second update intervals providing time, icao24, lat/lon, velocity, heading, vertrate, callsign, onground, alert/spi, squawk, baro/geoaltitude, lastposupdate, lastcontact.
State Vectors are our abstraction for tracking information. The data sets cover a full day and are extracted every Tuesday night for the preceding day. We offer three different formats: CSV, Avro and JSON. There is a file in either format for every hour. Be aware that CSV and JSON in particular are much larger after decompression.
Full description:
https://opensky-network.org/datasets/states/README.txt
Source:
https://opensky-network.org/datasets/states/
2. OpenSky Raw Data
Curator:
OpenSky
Description:
Contrary to other large aggregators, OpenSky collects and stores physical layer data with each message. This is a sample of such raw data.
These data sets contain samples of raw data as received by the OpenSky Network. As some users prefer to conceal the exact location of their receivers, the data set has been anonymized in the sense that their location has been removed from the data.
Sources:
https://opensky-network.org/datasets/raw/
https://opensky-network.org/datasets/raw/protected [Contact us for the password]
3. The LocaRDS Dataset
Curator:
OpenSky
Description:
With this work, we attempt to improve the current state of the art in localization research and put it on a solid scientific grounding for the future. Concretely, LocaRDS is an open reference dataset of real-world crowdsourced flight data from the OpenSky Network featuring more than 222 million measurements from over 50 million transmissions recorded by 323 sensors. LocaRDS can be used to test, analyze and directly compare different localization techniques. It is intended to answer in particular the open question of the aircraft localization problem in crowdsourced sensor networks.
Sources:
https://zenodo.org/record/4739276
https://github.com/openskynetwork/aircraft-localization
Scientific Paper:
LocaRDS: A Localization Reference Data Set
Matthias Schäfer, Martin Strohmeier, Mauro Leonardi, Vincent Lenders
Sensors, 21(16), p.5516.
4. The Covid-19 Flight Dataset
Curator:
OpenSky
Description:
The data in this dataset is derived, cleaned and enriched from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 5500 receivers since 1 January 2019. We stopped updating the dataset after December 2022.
Sources:
https://doi.org/10.5281/zenodo.3737101
https://doi.org/10.5281/zenodo.3931948 [CC-BY]
https://traffic-viz.github.io/scenarios/covid19.html
https://opensky-network.org/community/blog/item/6-opensky-covid-19-flight-dataset
Scientific Paper:
Crowdsourced Air Traffic Data from the OpenSky Network 2019–20
5. OpenSky's Aircraft Metadata Database
Curator:
OpenSky
Description:
Besides the ADS-B tracking data it is necessary for most reseach to have metadata about the tracked aircraft. OpenSky's aircraft database aggregates official and unofficial sources in order to provide this metadata. It is crowdsourced, too, so you can add missing, incorrect, or outdated information directly via the web interface. The database as a whole can also be downloaded in .csv format. Recently, we have started to keep monthly snapshots so the metadata is preserved how it was in a particular month.
Note: Due to some issues with the automatic dumps at the moment, there are complete manual dumps provided as an alternative.
Sources:
https://opensky-network.org/datasets/metadata/
https://opensky-network.org/aircraft-database
https://opensky-network.org/forum/bug-reports/652-are-the-aircraft-database-dumps-working#1719
6. Reference datasets for in-flight emergency situations
Curator:
Xavier Olive, ONERA
Description:
The data in this dataset is derived and cleaned from the full OpenSky dataset in order to illustrate in-flight emergency situations triggering the 7700 transponder code. It spans flights seen by the network's more than 2500 members between 1 January 2018 and 29 January 2020. It is principally sourced from our Alerts page.
Sources:
https://zenodo.org/record/3937483
https://traffic-viz.github.io/paper/squawk7700.html
Scientific Paper:
OpenSky Report 2020: Analysing in-flight emergencies using big data.
Xavier Olive, Axel Tanner, Martin Strohmeier, Matthias Schäfer, Metin Feridun, Allan Tart, Ivan Martinovic and Vincent Lenders.
In 2020 IEEE/AIAA 39th Digital Avionics Systems Conference (DASC), October 2020
7. Climbing Aircraft Dataset
Curator:
Richard Alligier, ENAC
Description:
This dataset contains the climbing segments of the year 2017 detected by OpenSky. The 11 most frequent aircraft types are studied. The obtained data set contains millions of climbing segments from all over the world. The climbing segments are not filtered according to their altitude. Predictive models returning the missing parameters are learned from this data set, using a Machine Learning method. The trained models are tested on the two last months of the year and compared with a baseline method (BADA used with the mean parameters computed on the first ten months). Compared with this baseline, the Machine Learning approach reduce the RMSE on the altitude by 48% on average on a 10 min horizon prediction. The RMSE on the speed is reduced by 25% on average. The trajectory prediction is also improved for small climbing segments. Using only information available before the considered aircraft take-off, the Machine Learning method can predict the unknown parameters, reducing the RMSE on the altitude by 25% on average.
The data set and the Machine Learning code are publicly available.
Source:
https://opensky-network.org/datasets/publication-data/climbing-aircraft-dataset/
Scientific Paper:
8. Database for world aircraft's common GICB capabilities
Curator:
Junzi Sun, TU Delft
Description:
The common usage Ground-initiated Comm-B (GICB) capabilities refer to a set of Mode S transponder downlink capabilities that are often interested by air traffic controllers. They include, for example, ADS-B messages, Mode S enhanced surveillance replies, and Mode S meteorological reports.
This dataset contains the GICB capabilities of 50,000 aircraft generated from the global Mode S data obtained by the OpenSky Network. It is published to support the scientific paper for OpenSky 2020 Symposium: Mode S Transponder Comm-B Capabilities in Current Operational Aircraft. The full process of data gathering and decoding is documented in the paper.
Source:
https://github.com/junzis/gicb-db
Scientific Paper:
Mode S Transponder Comm-B Capabilities in Current Operational Aircraft.
Junzi, Sun, Huy Vû, Xavier Olive, and Jacco M. Hoekstra.
In Proceedings of the 8th OpenSky Symposium 2020, p. 4. 2020.
9. The OpenSky ADS-C dataset
Curator:
OpenSky Network
Description:
ADS-C is an advanced surveillance system that utilizes an aircraft's onboard systems to automatically transmit crucial information, including position, altitude, speed, navigation intentions, and meteorological data. Different from ADS-B, ADS-C transmits contract data via satellite to specific Air Traffic Services Units (ATSU) or Aeronautical Operational Control (AOC) facilities, contributing to a more comprehensive and global approach to air traffic monitoring.
In the paper, we describe the background of ADS-C and implement a resource-intensive data collectio. We find that ADS-C can be an excellent complementary data source for researchers working with aviation data. The original dataset contains 227,126 messages collected over 4 months.
Source:
https://zenodo.org/records/10041840
Scientific Paper:
Xapelli, M., Lüscher, T., Tresoldi, G., Strohmeier, M., & Lenders, V.
In Proceedings of the 11th OpenSky Symposium, 2023.