Scientific Datasets
This page provides an overview of publicly available datasets derived from OpenSky data. These datasets have been created using our Trino historical data interface and live API.
Available Datasets
- Weekly 24 Hours of State Vector Data
- OpenSky Raw Data
- The LocaRDS Dataset
- The COVID-19 Flight Dataset
- OpenSky's Aircraft Metadata Database
- Reference Datasets for In-Flight Emergency Situations
- Climbing Aircraft Dataset
- Database for World Aircraft's Common GICB Capabilities
- The OpenSky ADS-C Dataset
1. Weekly 24 Hours of State Vector Data
Curator: OpenSky
These files are snapshot of several years' worth of Monday's complete state vector data collected by OpenSky. This data is available in 10 second update intervals providing time, icao24, lat/lon, velocity, heading, vertrate, callsign, onground, alert/spi, squawk, baro/geoaltitude, lastposupdate, lastcontact.
State Vectors are our abstraction for tracking information. We offer three different formats: CSV, Avro and JSON. There is a file in either format for every hour. Be aware that CSV and JSON in particular are much larger after decompression.
Sources:
2. OpenSky Raw Data
Curator: OpenSky
Contrary to other large aggregators, OpenSky collects and stores physical layer data with each message. This is a sample of such raw data.
These data sets contain samples of raw data as received by the OpenSky Network. As some users prefer to conceal the exact location of their receivers, the data set has been anonymized in the sense that their location has been removed from the data.
Sources:
3. The LocaRDS Dataset
Curator: OpenSky
With this work, we attempt to improve the current state of the art in localization research and put it on a solid scientific grounding for the future. Concretely, LocaRDS is an open reference dataset of real-world crowdsourced flight data from the OpenSky Network featuring more than 222 million measurements from over 50 million transmissions recorded by 323 sensors. LocaRDS can be used to test, analyze and directly compare different localization techniques. It is intended to answer in particular the open question of the aircraft localization problem in crowdsourced sensor networks.
Sources:
Scientific Paper:
4. The COVID-19 Flight Dataset
Curator: OpenSky
The data in this dataset is derived, cleaned and enriched from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 5500 receivers since 1 January 2019. We stopped updating the dataset after December 2022.
Sources:
Scientific Paper:
Crowdsourced Air Traffic Data from the OpenSky Network 2019–20
5. OpenSky's Aircraft Metadata Database
Curator: OpenSky
Besides the ADS-B tracking data it is necessary for most reseach to have metadata about the tracked aircraft. OpenSky's aircraft database aggregates official and unofficial sources in order to provide this metadata. It is crowdsourced, too, so you can add missing, incorrect, or outdated information directly via the web interface. The database as a whole can also be downloaded in .csv format. Recently, we have started to keep monthly snapshots so the metadata is preserved how it was in a particular month.
Note: Due to some issues with the automatic dumps at the moment, there are complete manual dumps provided as an alternative.
Sources:
6. Reference Datasets for In-Flight Emergency Situations
Curator: Xavier Olive (ONERA)
The data in this dataset is derived and cleaned from the full OpenSky dataset in order to illustrate in-flight emergency situations triggering the 7700 transponder code. It spans flights seen by the network's more than 2500 members between 1 January 2018 and 29 January 2020. It is principally sourced from our Alerts page.
Sources:
Scientific Paper:
OpenSky Report 2020: Analysing In-Flight Emergencies Using Big Data
7. Climbing Aircraft Dataset
Curator: Richard Alligier (ENAC)
This dataset contains the climbing segments of the year 2017 detected by OpenSky. The 11 most frequent aircraft types are studied. The obtained data set contains millions of climbing segments from all over the world. The climbing segments are not filtered according to their altitude. Predictive models returning the missing parameters are learned from this data set, using a Machine Learning method. The trained models are tested on the two last months of the year and compared with a baseline method (BADA used with the mean parameters computed on the first ten months). Compared with this baseline, the Machine Learning approach reduce the RMSE on the altitude by 48% on average on a 10 min horizon prediction. The RMSE on the speed is reduced by 25% on average. The trajectory prediction is also improved for small climbing segments. Using only information available before the considered aircraft take-off, the Machine Learning method can predict the unknown parameters, reducing the RMSE on the altitude by 25% on average.
The data set and the Machine Learning code are publicly available.
Sources:
Scientific Paper:
8. Database for World Aircraft's Common GICB Capabilities
Curator: Junzi Sun (TU Delft)
The common usage Ground-initiated Comm-B (GICB) capabilities refer to a set of Mode S transponder downlink capabilities that are often interested by air traffic controllers. They include, for example, ADS-B messages, Mode S enhanced surveillance replies, and Mode S meteorological reports.
This dataset contains the GICB capabilities of 50,000 aircraft generated from the global Mode S data obtained by the OpenSky Network. It is published to support the scientific paper for OpenSky 2020 Symposium: Mode S Transponder Comm-B Capabilities in Current Operational Aircraft. The full process of data gathering and decoding is documented in the paper.
Scientific Paper:
Mode S Transponder Comm-B Capabilities in Current Operational Aircraft
9. The OpenSky ADS-C Dataset
Curator: OpenSky Network
ADS-C is an advanced surveillance system that utilizes an aircraft's onboard systems to automatically transmit crucial information, including position, altitude, speed, navigation intentions, and meteorological data. Different from ADS-B, ADS-C transmits contract data via satellite to specific Air Traffic Services Units (ATSU) or Aeronautical Operational Control (AOC) facilities, contributing to a more comprehensive and global approach to air traffic monitoring.
In the paper, we describe the background of ADS-C and implement a resource-intensive data collectio. We find that ADS-C can be an excellent complementary data source for researchers working with aviation data. The original dataset contains 227,126 messages collected over 4 months.