A Quick Guide To OpenSky's Historical Data Interface
Besides our public API, we also grant free access to our full dataset over an SQL-like query interface to researchers and partners. If you don't have access to this free premium service yet, but good reasons to get it, you can apply for access by submitting this application form (website account + login required). Note that this interface is intended to serve university-affiliated researchers, governmental organisations, or aviation authorities as a data source for aviation-related research or incident investigation. If you are a private or commercial entity, please contact us for a license.
Since processing large queries is expensive in terms of resources and cluster energy consumption, we reserve the right to reject applications which lack sufficient reason to gain access. We will also revoke access rights in case of misuse.
Important: In 2024, we have moved to a new backend and accordingly a new access method for historical data. This guide has been updated to reflect this.
Our new interface is backed by Trino and S3. As Trino has been built on our new user/access control system, we need you to register a new account there. Please use the following link to do so and let us know the account name in the request form if it is not the same as your OpenSky user name: http://trino.opensky-network.org/
Note that there is also a Python interface for the historical data available in Xavier Olive's traffic library and Junzi Sun's pyOpenSky interface. Both support the new Trino interface without you changing your code. We highly suggest you look into these tools and see whether they are helpful for your use case. For a full overview, see our Data Tools page.
1. Connecting to the shell
After having registered an account on http://trino.opensky-network.org/, you must first receive the access rights from us via the application form. You can then use either traffic and pyModeS or any of the Trino clients to connect to the database using trino.opensky-network.org with your new account credentials. The software will ask you to do a single sign on (not with your normal OpenSky account!), after which you are authenticated and can use the database.
This is a generic example for the command line:
Important note: The website will allow you to enter and convert capital letters but the user name is actually stored in lower case, which the CLI is not compatible with. Hence, you will have to use all lower case for your user name for the CLI, Python and all other programmatic access.
2. Exploring the database
Let’s start with exploring the database and see which data is available.
To get a list of all available tables, run the following command:
show tables; Query: show tables +--------------------------+ | name | +--------------------------+ | acas_data4 | | allcall_replies_data4 |
| flights_data4 | | identification_data4 | | operational_status_data4 | | position_data4 | | rollcall_replies_data4 | | sensor_visibility | | sensor_visibility_data3 | | state_vectors | | state_vectors_data3 | | state_vectors_data4 | | velocity_data4 | +--------------------------+ Fetched 13 row(s) in 0.04s
If you are familiar with SSR Mode S downlink communications, the table names should ring a bell. I won’t cover every single table in this tutorial as it’s rather intended as a quick guide on how to use this interface and the data. Those of you who are interested in the low-level tables (e.g. in rollcall replies) should know their meanings anyway. For more information and a comprehensible introduction to SSR Mode S, I recommend reading the part on SSR Mode S of Christian Wolff’s radar tutorial. However, depending on what you intend to do, you might not need this knowledge as using the state vectors is pretty straightforward as you'll see in a minute and doesn't require detailed knowledge about the underlying technology.
The first important thing to observe is the ending of the tables. They indicate the version of the batch layer implementation which generated the data. As of this writing, we are running the 4th implementation and you can consider everything else deprecated and buggy and it might be removed in the future.
From my experience, most of you are interested in what we call state vectors. These are basically summaries of all other tables containing most common tracking information. I’ll cover this table as an example for how to explore the database. You’ll be able to apply the same methodology to the other tables as well. Let’s first have a closer look at the table structure:
describe state_vectors_data4; Query: describe state_vectors_data4 +---------------+------------+-----------------------------+ | name | type | comment | +---------------+------------+-----------------------------+ | time | int | Inferred from Parquet file. | | icao24 | string | Inferred from Parquet file. | | lat | double | Inferred from Parquet file. | | lon | double | Inferred from Parquet file. | | velocity | double | Inferred from Parquet file. | | heading | double | Inferred from Parquet file. | | vertrate | double | Inferred from Parquet file. | | callsign | string | Inferred from Parquet file. | | onground | boolean | Inferred from Parquet file. | | alert | boolean | Inferred from Parquet file. | | spi | boolean | Inferred from Parquet file. | | squawk | string | Inferred from Parquet file. | | baroaltitude | double | Inferred from Parquet file. | | geoaltitude | double | Inferred from Parquet file. | | lastposupdate | double | Inferred from Parquet file. | | lastcontact | double | Inferred from Parquet file. | | serials | array<int> | Inferred from Parquet file. | | hour | int | | +---------------+------------+-----------------------------+ Fetched 18 row(s) in 0.05s
Units
Units are unified across all tables, so if you find columns with the same content (e.g. timestamps) in other tables, they’ll have the same unit. In general, we are using the metric system in our data as it’s straight-forward and easily understood by non-aviation experts. That means that speed is always in meters per second, distances or lengths (e.g. altitude) are in meters, time in seconds and so on.
The Columns
Let’s go through the columns from top to bottom for a particular (yet random) state vector:
select * from state_vectors_data4 where hour=1480759200 order by rand() limit 1; Query: select * from state_vectors_data4 where hour=1480759200 order by rand() limit 1 +------------+--------+-------------------+--------------------+-------------------+-------------------+----------+----------+----------+-------+-------+--------+-------------------+-------------------+----------------+----------------+------------+ | time | icao24 | lat | lon | velocity | heading | vertrate | callsign | onground | alert | spi | squawk | baroaltitude | geoaltitude | lastposupdate | lastcontact | hour | +------------+--------+-------------------+--------------------+-------------------+-------------------+----------+----------+----------+-------+-------+--------+-------------------+-------------------+----------------+----------------+------------+ | 1480760792 | a0d724 | 37.89463883739407 | -88.93331113068955 | 190.8504039695975 | 265.8263544365708 | 0 | UPS858 | false | false | false | 7775 | 9144 | 9342.120000000001 | 1480760704.359 | 1480760709.997 | 1480759200 | +------------+--------+-------------------+--------------------+-------------------+-------------------+----------+----------+----------+-------+-------+--------+-------------------+-------------------+----------------+----------------+------------+ Fetched 1 row(s) in 1.48s
time
This column contains the unix (aka POSIX or epoch) timestamp for which the state vector was valid. You’ll find one state vector per second for each aircraft which was active within the coverage of OpenSky at that particular second. For more information on how these state vectors are generated, please refer to the API documentation. In the above example, the time is 1480760792 which means that we are looking at a state vector that was valid on Saturday, 03-Dec-16 10:26:32 UTC. Tip: There are online tools available for converting unix timestamps, just use Google.
icao24
This column contains the 24-bit ICAO transponder ID which can be used to track specific airframes over different flights. This ID should never change during a registration period of an airframe, which doesn't change very often. So if you are looking for a particular aircraft, try to find out its 24-bit transponder ID and filter by this column. In our data, it’s represented as a 6 digit hexadecimal number (string). In our case, we are looking at the state of an aircraft using the transponder ID a0d724. If you look it up on our Aircraft Database, you’ll find out that this transponder ID is used by an Airbus A306 owned by UPS. You will find this column in all tables.
lat/lon
These column contain the last known latitude and longitude of the aircraft. Coordinates are stored as decimal WGS84 coordinates. So here is what we know so far: On Saturday, 03-Dec-16 at 10:26:32 UTC, the UPS aircraft with transponder ID a0d724 was at position 37.89463883739407,-88.93331113068955. If you look it up on Google maps, it’s somewhere in Illinois in the US.
velocity
This column contains the speed over ground of the aircraft in meters per second. In our example, the UPS aircraft flew over Illinois at a speed of 190.8504039695975 meters per second.
heading
This column represents the direction of movement (track angle) as the clockwise angle from the geographic north. Just a little side note for the aviation experts: you might want to complain that the term “heading” is not perfectly correct (if you are one of them you know what I mean) and something like “track” or “track angle” might be more correct. You are absolutely right and this might change in future releases, but for legacy reasons it’s called “heading” for now. In our case, the aircraft flew into the direction 265.8263544365708° clockwise from geographic north, or in other words, to the west.
vertrate
This column contains the vertical speed of the aircraft in meters per second. A negative number indicates that the aircraft was descending, a positive number indicates a ascend respectively. In the above example, the UPS aircraft was neither ascending nor descending.
callsign
This column contains the callsign that was broadcast by the aircraft. Most airlines indicate the airline and the flight number in the callsign, but there is no unified system. In our example, the callsign indicates that this state vector belongs to UPS flight 858. By looking up the flightnumber on services like flightaware.com, you’ll find out that this flight goes from Lousville to Phoenix every day.
onground
This flag indicates whether the aircraft is broadcasting surface positions (true) or airborne positions (false). Our UPS aircraft was airborne.
alert/spi
These two flags are special indicators used in ATC. If you need them, you’ll know what they mean.
squawk
This 4-digit octal number is another transponder code which is used by ATC and pilots for identification purposes and indication of emergencies. Usually, ATC assigns squawks to aircraft when they enter their airspace via radio. In the above example, the UPS flight was assigned squawk “7775”. See e.g. Wikipedia for a list of special purpose squawks.
baroaltitude/geoaltitude
These two columns indicate the aircraft’s altitudel. As the names suggest, baroaltitude is the altitude measured by the barometer and depends on factors such as weather, whereas geoaltitude is determined using the GNSS (GPS) sensor. In our case, the aircraft was flying at a geometric altitude (or height) of 9342.12 meters and a barometric altitude of 9144 meters. That makes a difference of almost 200 meters. You are likely to observe similar differences for aircraft in spatial and temporal vicinity. Note that due to its importance in aviation, barometric altitude will almost always be present, while the geometric altitude depends on the equipage of the aircraft.
lastposupdate
This unix timestamp indicates the age of the position. The position of the state vector above was already 87.64 seconds old at the time when the state vector was created (time) and should not be used any longer.
lastcontact
This unix timestamp indicates the time at which OpenSky received the last signal of the aircraft. As long as the aircraft is flying in an airspace which is well-covered by OpenSky’s receivers this timestamp should never be older than 1-2 seconds compared to the state vectors timestamp (time). Apparently, OpenSky’s coverage in Illinois was not too good in December 2016 since the last contact indicates that the aircraft left the covered airspace already 82 seconds ago. OpenSky continues generating state vectors for 300 seconds after the last contact. Depending on your application, you can filter state vectors which are, e.g., older than 15 by adding a WHERE-clause to your query saying “WHERE time-lastcontact<=15”. The relationship between the three timestamps explained so far is time > lastcontact >= lastposupdate.
hour
Since batch layer version 4, the data is processed and partitioned in hourly batches. This timestamp marks the beginning of the hour to which the data belongs.
The Other Tables
There are only a couple of things left to say concerning the other tables. In most of the other tables you will find the columns mintime, maxtime, and msgcount. These columns are used to summarize messages which were received multiple times. As the names suggest, the respective message was received at mintime for the first time and at maxtime for the last time. In total, the respective row in the table summarizes msgcount duplicates that were received between mintime and maxtime.
If you want to learn more about the individual columns of each table, you should refer to the standard describing the respective Mode S downlink formats. Specifically, the values of the SSR Mode S tables acas_data4, allcall_replies_data4, and rollcall_replies_data4 are specified in ICAO’s Annex 10 Volume 4 and the ADS-B tables identification_data4, operational_status_data4, position_data4, and velocity_data4 are defined in RCTA’s DO-260B.
For the sake of completeness, let's go quickly through one example line, say, from the allcall_replies_data4 table:
SELECT * FROM allcall_replies_data4 LIMIT 1; Query: select * FROM allcall_replies_data4 LIMIT 1 +----------------+----------------+----------------+----------+--------+--------------+--------------+------------+ | rawmsg | mintime | maxtime | msgcount | icao24 | capabilities | interrogator | hour | +----------------+----------------+----------------+----------+--------+--------------+--------------+------------+ | 5d400fbab21825 | 1478293200.038 | 1478293636.342 | 36 | 400fba | 5 | 00003a | 1478293200 | +----------------+----------------+----------------+----------+--------+--------------+--------------+------------+ Fetched 1 row(s) in 0.32s
This row means that on Friday, 04-Nov-16 between 21:00:00 and 21:07:16 UTC, OpenSky received 36 replies from the aircraft using the transponder ID 400fba to all-call interrogations from an interrogator with the ID 00003a. The aircraft reported a capability of 5, which means according to ICAO Annex 10 Vol. 4, section 3.1.2.5.2.2.1 that the transponder level is 2 or above and the aircraft is airborne.
Please note that while state_vector_data4's retention time in the database is unlimited, some of the other tables may currently only hold this data for a shorter time (e.g. around one year).
We have also added the flights_data4 table, which is a further abstraction that has been proved useful for many researchers. It also forms the basis of our Covid-19 dataset, where can find a detailed explanation of the table.
3. Tools & A Note on Performance
ONERA's Xavier Olive wrote a Python library (traffic) for extracting trajectories from the OpenSky interfaces and other data sources. Code and documentation are available on github. More data tools are now available on the respective page. These libraries and Trino command line tools will serve you when saving the data.
Regarding performance and limitations:
1. Do not use many parallel queries! This is a shared resource and you must be mindful of other researchers.
2. Do not create queries that are too large and take too long. As a rule of thumb, if you do not get a reply within 5 or so minutes, re-shape your query to something more efficient. If you need a bulk download of all OpenSky data in a given day (that's a lot of data!), please contact us for more suitable interfaces.
Ignoring these rules can lead to revocation of your access rights. We have implemented a queuing system that allows for 2 concurrent queries and 2 queued queries per individual account. There are also global limits for concurrent queries implemented. If you get an error with regards to the queue, try again later.
4. Some Useful Query Examples
- Get 1 position (the latest one) per aircraft every minute:
SELECT * FROM state_vectors_data4 v JOIN (SELECT QUOTIENT(time, 60) AS minute, MAX(time) AS recent, icao24 FROM state_vectors_data4 WHERE hour=1480762800 GROUP BY icao24, minute) AS m ON v.icao24=m.icao24 AND v.time=m.recent WHERE v.hour=1480762800;
- Get the last position of each aircraft seen by the receiver with the serial 1344390019 during the hour 1480762800:
SELECT * FROM state_vectors_data4 v JOIN (SELECT MAX(time) AS recent, count(*) as cnt, icao24 FROM state_vectors_data4, state_vectors_data4.serials s WHERE hour=1480762800 AND s.ITEM=1344390019 GROUP BY icao24) AS m ON v.icao24=m.icao24 AND v.time=m.recent WHERE cnt>30 AND v.hour=1480762800;
Note: the filter "AND cnt>30" is necessary because the data provided by the receivers is noisy and in some cases receivers get associated with state vectors based on noise. We do our best to prevent that but it still happens. Filters on the count are just a band-aid and do not work 100%, but they perform quite well. You might also want to apply filters on latitude and longitude to further reduce noise.
- Count aircraft which arrived or departed or crossed Frankfurt airport during a certain hour:
SELECT COUNT(DISTINCT icao24) FROM state_vectors_data4 WHERE lat<=50.07 AND lat>=49.98 AND lon<=8.62 AND lon>=8.48 AND hour=1493892000;
5. Further Reading
- OpenSky Report 2016: Facts and Figures on SSR Mode S and ADS-B Usage
Examplary collection of statistics of our data set in the early years. The paper also features detailed explanations of the data contained in the tables, such as the different accuracy information associated with each ADS-B position in the position_data4 table. - Crowdsourced Air Traffic Data from the OpenSky Network 2019–20
Written durng the Covid-19 pandemic, this paper provides an overview of the flights_data4 table and the general reach of OpenSky data.
6. Conclusion
Now you are good to go! Have fun exploring our dataset and let us know if you find interesting or peculiar things. If you have any feedback or questions regarding this guide, just contact us!
7. Notification of limited historical data
Unfortunately, we experienced severe technical issues during the year 2023, with technical debt, growing user and aircraft load and data center outages happening at the same time and causing some permanent data loss. We list the affected times here so you can exclude them from your research. All times are in UTC. Our apologies if you needed exactly these yours but things should again be much more stable going forward!
8. Credits
We would like to thank the following people for their feedback, corrigenda, and otherwise useful comments that helped improve this guide:
- Thomas Quilling
- Enrico Spinielli
- Xavier Olive
- Junzi Sun