Location

SQuaRE Zoom: https://ls.st/wyp

Time

10:00 am PT

Attendees

Goals

  • Share knowledge of the EFD troubleshooting procedures and operation


Health checks

Check Summit EFDUSDF EFD

Kafka disk usage 

80G68G

InfluxDB disk usage

550G (11%)15TB (50%)

Status of the connectors

RunningRunning

Last InfluxDB pod restart

   Power-up of the computer room at the Summit

  State:          Running
      Started:      Wed, 24 Jan 2024 15:13:41 -0700
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 24 Jan 2024 08:52:59 -0700
      Finished:     Wed, 24 Jan 2024 15:11:09 -0700

k8s upgrade


 State:          Running
      Started:      Thu, 15 Feb 2024 11:12:20 -0700
    Ready:          True
    Restart Count:  0

Last Kafka pods restart

Power-up of the computer room at the Summit 

State:          Running
      Started:      Wed, 24 Jan 2024 15:13:49 -0700
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 24 Jan 2024 08:53:00 -0700
      Finished:     Wed, 24 Jan 2024 15:11:11 -0700

k8s upgrade

State:          Running
      Started:      Thu, 15 Feb 2024 10:23:46 -0700
    Ready:          True
    Restart Count:  0

Monitoring dashboards

Discussion items

TimeItemWhoNotes
10 minReview the past week's EFD events All

See #com-efd-status 

Data corruption errors at Kafka USDF

K8s upgrade caused an interruption in data replication. To prevent that, we need to run more Mirror Maker replicas and configure Pod Disruption Budgets to guarantee minimum availability when nodes are drained (working on a PR for that)

Fiber cut today between Summit and La Serena, operating on the backup wireless link. There was no interruption in the data replication to USDF.

30 min EFD presentation at JTM All

See plans for USDF EFD, Summit EFD, and other activities

10 min Should we update EFD system requirements in LSE-30All

SQR-085 shows our best estimate for EFD storage requirements. We could use that to update the numbers in LSE-30

Frossie suggested showing LSE-30 requirements in the data rates plot.


LSE-30 says the minimum required storage for the EFD is 30Gbytes/day (or 210Gbytes/week green line in the plot), but LSE-30 numbers are inconsistent right now; if you use the average data rate of 6.5 Mbytes/s, you get 560 Gbytes/day instead. 

TODO: still need to update SQR-085 to show a consolidated table with throughput for telemetry, events, and command topics per Michael's suggestion.

AOB