Meeting Notes 2016-03-10

Date

10 Mar 2016

Attendees

Discussion items

Base site planning: implications for Kerberos, LDAP, and IAM in general
Using NCSA Kerberos/LDAP in LSST VMs
Managing NCSA Nebula OpenStack access

Change to BlueJeans for next meeting (instead of Google Hangouts).

Alex: Base Site Planning slides are available on Confluence. Several slides about IT infrastructure at base and summit. From these presentations, distilled into slides re: impact of authentication/authorization decisions for base/summit. Much of the requirements discussion is occurring at https://confluence.lsstcorp.org/pages/viewpage.action?pageId=16908452 . Requirement that was repeated is that the summit can operate even during network outages. This implies common accounts across machines. This implies something at the summit that caches authn/authz information for a (small) set of users.

Gregory: We don't want to have to shutdown all consoles during shift changes because it is disruptive. Shift changes during the night every few hours, we don't want disruption during this time. In general, we should think about commissioning driving the requirements. This time will have more people at the summit requiring access. All relevant staff should have access to the "cache" during this period.

Alex: after that time, may narrow down the list of authorized users.

Gregory: requirement is that observatory be able to operate 48 hours without network connectivity. During that 48 hour period, can we have an allowed degradation that we cannot create new users? Are we or are we not supporting that?

Jim: Expect that it will be a read-only cache, so that new users, password changes, etc. would not be available during network outages.

Doug: we can cache users for as long as we want, but changes needs to occur at the "master" which then get propagated back to the slaves (i.e., the summit). There are several caching options, or other alternatives.

Jim: this is a key design decision that should be presented to the 'deciders'. We write up a short doc about what can occur at the summit for authn/authz during network outages.

Alex: restrict access to camera system via a bastion/jump host. Key requirement is that list of users is restricted (during regular operation). Also mentioned shared (read-only?) filesystem which would serve up configuration settings (such as calibration constants).

Gregory: for the main camera science centers, it has been assumed that it is okay if the calibration for the main camera is not reduced during a 48 hour outage. (Discussion about capabilities necessary at the summit, wave-front sensors.) It is a requirement that the model be update-able at all time.

Alex: my understanding from the California meeting is wrong.

Gregory: whatever access control policy is chosen to be implemented for the control systems at the summit needs to be signed off on.

Jim: two-factor authentication needs to be mirrored as well for network downtime.

Gregory: may also be internal accounts (not relying on external authn like GitHub)

Alex: remote access to site is a requirement (vpn, ssh). also requirement to be able to limit access to a subset of users. also requirement for both service (e.g., shared account) and personal accounts.

Jim: problems in the past at NCSA connecting VPN to LDAP. Looking to Doug. May need to look at alternative products which would work just fine.

Alex: Talking with Paul Weffel. Don't want VPN to connect someone's home computer into the same network as direct control. Alternatively VPN connects users t VLAN which has less access, and then user jumps through more hoops for additional systems access. Q: if all systems are running linux, what is the point of the VPN? Won't most people be using ssh only?

Frossie: laptop users (mac/pc) may wantaccess which would require VPN.

Alex: need to decide what services VPN users would need. This is an issue that Alex and Paul need to work out.

Gregory & Frossie: discussion about code development / deployment at base/summit during network outage.

Jim: Will we be operating OpenStack-based infrastructure at the base site? What are the implications? This gives rise to two levels of authentication, those with access to the VM guest consoles, and those accessing the VM hosts to launch new clients.

Gregory: this has not been decided yet. But virtualization is today's standard, and not doing it would need a good reason.

Doug: we made need to mirror VM container infrastructure to base.

Jim: so I would argue we need to use authn/authz scheme for both levels of Virtualization (hosts and guests). Would like a project plan to move to using kerberos/ldap for both levels of VM.

Doug: we need to support the service level accounts in LDAP which we currently do not do. For Nebula this means we may need to rebuild the entire infrastructure using LDAP identities. We are working towards this. This is on the horizon. What we have not done is made all guest OS use Kerberos/LDAP by default. It is possible to do so manually, but not automatic. OpenStack has a "tenant" paradigm. VMs have been built with LSST tenant which allows for LSST user access to the VM. Plans for implementing Nebula "zones" which allow for separation of servers. Also working with VMware about their container solution VIO, which has additional capabilities such as HA. VMware will be here next week doing a test build to see if it is a valuable upgrade to our open source Nebula.

Frossie: needs to go. What happened to Doug's user meetings?

Doug: needs to schedule it.

Doug: SSSD is our ongoing solution for individual instances. We know how to do it, but you need to designate the group for initialization. Talking about if we can do hierarchical LDAP solution. It works, but not across all software services. Also working on LDAP slave pool. Currently, all queries are going to the master. A "slave hub" member could be mounted at both summit and base, which could work during network outage. Kerberos replication is possible but (probably) without two-factor. Duo has a 'call home' model, RSA could have local replicas, but it is tricky since slaves may not update in a timely fashion. Kerberos slaves are currently configured for XSEDE, so the problem has been addressed in the past.

Gregory: in the L3 environment, may want to consider a system where you have unmanaged VM which are considered 'hostile'. Need to at least document that we have thought about this issue, and what we can offer users. Perhaps during next visit in April.

Page tree

Meeting Notes 2016-03-10

Date

Attendees

Discussion items