Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Then logged onto qserv-db14  and found that xrootd  was dead.

Resolution

An origin of the problem was a deadlock introduced in method:

Code Block
nlohmann::json SchedulerBase::statusToJson() {
    std::lock_guard<std::mutex> lock(_countsMutex);
    ...
    status["num_tasks_in_queue"] = getSize();
    status["num_tasks_in_flight"] = getInFlight();
    ...
}

Method getSize()  acquires another lock. And a caller of SchedulerBase::statusToJson  may also acquire the very same (the second lock).

A solution was to call virtual methods getSize  and getInFlight  outside of the lock area.