...
Then logged onto qserv-db14 and found that xrootd was dead.
Resolution
An origin of the problem was a deadlock introduced in method:
Code Block |
---|
nlohmann::json SchedulerBase::statusToJson() {
std::lock_guard<std::mutex> lock(_countsMutex);
...
status["num_tasks_in_queue"] = getSize();
status["num_tasks_in_flight"] = getInFlight();
...
} |
Method getSize() acquires another lock. And a caller of SchedulerBase::statusToJson may also acquire the very same (the second lock).
A solution was to call virtual methods getSize and getInFlight outside of the lock area.