Missing historic data (22/01/2022 10:00am GMT onwards)

2 years 3 months ago #1732 by Herson
Hi Guys,

I believe there are several days of missing historic data from the Impala table:  state_vectors_data4.  I.e. no data loaded since the Saturday, 22 January 2022 10:00:00  GMT+00:00 . Has this already been reported?  

Thanks

Richard.

PROOF.
select count(*), hour
FROM state_vectors_data4
where hour between 1642684695 and 1643155200.0
group by hour order by hour desc;

Please Log in or Create an account to join the conversation.

2 years 2 months ago #1734 by strohmeier
Yes, some HD hiccups. Will be fixed, no ETA.

Please Log in or Create an account to join the conversation.

2 years 2 months ago #1736 by Herson
Thanks. Not quite sure what is meant by the term "HD" - hard disk? Is there anything I can do to help?

Richard.

Please Log in or Create an account to join the conversation.

2 years 2 months ago #1744 by Herson
Had another look today at this problem and it appears the HDFS file hdfs://nameservice1/user/opensky/tables_v4/state_vectors/hour=1642838400/part-r-00169-b930767e-9383-452b-84ff-bd7cd0deb55c.snappy.parquet is CORRUPTED or MISSING for the whole day of GMT: Saturday, 22 January 2022 00:0:01.

Appears the table/file is okay from GMT: Sunday, 23 January 2022 00:01:00 onwards although there is no data.

Proof:

-- Failing query
select count(*), hour FROM state_vectors_data4 where
hour between 1642831260+(3600*1) and 1642831260+(3600*19) group by hour;

-- working query but no data.
select count(*), hour FROM state_vectors_data4 where
hour between 1642831260+(3600*24) and 1642831260+(3600*36) group by hour;

Please Log in or Create an account to join the conversation.

2 years 2 months ago #1745 by Herson
Much of the missing data now appears to have been recovered for the table : state_vectors_data4, thanks.

However, the date range between Sunday, 23 January 2022 00:00:00 and Sunday, 30 January 2022 23:00:00 has no data and also the date Sunday, 23 January 2022 00:00:00 appears to be corrupted (Failed to open HDFS file).

Proof:

select count(*), hour FROM state_vectors_data4 where
hour between (1643583600- (3600*24*7)) and (1643583600+(3600*1)) group by hour order by hour desc;

Please Log in or Create an account to join the conversation.

2 years 2 months ago #1746 by strohmeier
Thanks, yes we are aware and data is being back processed. Since it's a lot, the estimated time is about 3-4 days. Everything since 30 Jan should be as normal.

Please Log in or Create an account to join the conversation.

2 years 2 months ago #1747 by Herson
Thanks for the update and everyone's assistance as it helps to know we can start again using the part of the dataset that has been recovered. 
Richard.

Please Log in or Create an account to join the conversation.

Powered by Kunena Forum
This website uses cookies to offer you the best experience of our services. By using this website you agree to our privacy policy!