Location
SQuaRE Zoom: https://ls.st/wyp
Time
11:00 am PT
Attendees
Goals
- Share knowledge of the EFD troubleshooting procedures and operation
Discussion items
Time | Item | Who | Notes |
---|
15 min | 1. Review past week EFD events | All | See #com-efd-status Monday, Aug 28 Tuesday, Aug 29 - OSPL daemon on yagan08 crashed at the Summit
Wednesday, Aug 30 - Kafka restarted at the Summit for an unknown reason. Self-consistency checks based on the private_seqnum indicate message loss that day, but we need to improve that check and see if the timestamp of missing data correlates with the timestamp of the Kafka restart.
- Maintenance at USDF, InfluxDB Sink connectors rescheduled to another k8s node. Consistency checks show not message loss that day between Summit and USDF.
Thursday, Aug 31 - !! We can't operate the Observatory (and thus collect any engineering data) since Thursday morning (Yagan maintenance) !!
NOTE: It seems that we are not missing Kapacitor notifications from USDF to Slack anymore
DM-40098
-
Getting issue details...
STATUS
|
5 min | 2. InfluxDB Enterprise | Angelo | - Meeting with InfluxData on Thursday, Acceptance Criteria reviewed internally and shared with InfluxData
- Meeting again on Thursday, September 14th
- We need to prepare for the POC phase
|
15 min | 3. Inconsistencies in the EFD data between Summit and EFD
| All | - We need to improve the Consistency Checks notebook
- Improvements:
- podAntiAffinity configuration applied to all environments
DM-40510
-
Getting issue details...
STATUS
- Upgrate kafka to 3.5.1 (important MM2 fixes)
- Move Kafka data partitions to local disk at Base
- Dedicated nodes to Kafka at USDF?
- We need a "control environment". It could be Base (if we can move the Kafka data partitions to local disk) or Google. To deploy at Google we need to bypass the VPN since MM2 runs on the target cluster and needs to connect to the Summit. Other ideas?
|
5 min | 2. USDF data corruption errors and repairer connectors | Angelo | We haven't seen any data corruption errors this week. Repairer connectors are still running at USDF to help to investigate this issue.
|
10 min | AOB |
| (Hsin-Fang): If we have time, I'd like to see what I may do for adding the JDBC connector. It looks to me that the JDBC sink is already supported in kafka-connect-manager. Testing before turning it on? |
Action items
- DM-40515 Test CSC is not recording data to the expected InfluxDB measurementAnge lo Fausti
- Fix Kapacitor configuration at Base