Infastructure Infrastructure meetings take place every other Thurs. at 9:00 Pacific on the BlueJeans infrastructure-meeting channel: https://bluejeans.com/383721668
Date
...
Forms - Checkbox button group
0_label
Hsin-Fang Chiang
3_value
Igor Gaponenko
description
2_selected
falsetrue
5_label
Brian Van Klaveren
10_selected
false
required
false
8_value
Fritz Mueller
1_selected
falsetrue
4_selected
false
7_selected
false
9_label
Donald Petravick
10_label
John Swinbank
2_label
Gregory Dubois-Felsmann
5_value
Brian Van Klaveren
cssClass
0_value
Hsin-Fang Chiang
4_label
Fabio Hernandez
7_value
Kian-Tat Lim
labelLength
50%
11_selected
true
10_value
John Swinbank
7_label
Kian-Tat Lim
11_value
Xiuqin Wu
2_value
Gregory Dubois-Felsmann
9_selected
false
6_label
Simon Krughoff
9_value
Donald Petravick
cssStyle
4_value
Fabio Hernandez
1_label
Paul Domagala
3_selected
falsetrue
counter
1112
label
Attendees
6_selected
false
multiselect
false
3_label
Igor Gaponenko
6_value
Simon Krughoff
11_label
Xiuqin Wu
8_label
Fritz Mueller
name
Attendees
1_value
Paul Domagala
8_selected
falsetrue
0_selected
false
5_selected
falsetrue
Goals
Ensure successful use of the current NCSA infrastructure
Igor Gaponenko [3:29 PM] @channel I’m not sure where should I post this complain, in this forum or in #dm-infrastructure. A problem is that the only filesystem we have in the PDAC *master* node *lsst-qserv-master01* has really horrible I/O performance. I wouldn’t worry much about it unless the very same file system was not shared by the OS and *Qserv*‘s MySQL/MariaDB database server. This setup bites us in two ways. Firstly we’re using this file system (via the database server) to store intermediate results reported by *worker* nodes before doing the result set aggregation. In some cases the result sets could be rather large (a few *GB* per query). Secondly, the database service provides a number of key catalogs, some of which could be rather large (like the so called *secondary index*). The current disk subsystem of the node is just no match to those tasks. For example, when I’m scanning one of the *secondary index* (just to count the number of entries) then I’m seeing: ```iostat -m 1
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 363.00 13.31 0.05 13 0 dm-0 364.00 13.31 0.05 13 0 ``` *NOTE* how low is *BOTH* the CPU utilization and the disk I/O (for both IOPS and MB/s) . This looks just horrible. s there any chance we could add thw second file system based on 4 SSD in the RAID10 (0+1) configuration? That should’t be super expensive. Four 0.5 TB disks would cost a couple of thousand. And this must be be the software-based RAID to allow the TRIM-ing (if that’s still a problem for the newest SSD disks). If we could put the NVMe disk then it would be even better.