Butler has been refactored enough that we are ready to try a simple example of a non-posix storage back end.

We think a good way to do this would be to set up a Butler Repository in Swift storage at NCSA and make changes in Butler to get a simple (read: not perfect, but useable) example running where Butler uses this non-local object store repository.

This will allow us to verify that

  • The Butler framework can be used to use a non-posix repository
  • Swift can be used to host a Butler repository

Butler Framework Architectural Considerations

Auth

The connection to Swift will require authorization/authentication.

In Swift, this requires User Name, Tennant Name (this is like a project name), Authorization Server URL, and a Password (long, alphanumeric). The server returns a token and a URL to the server. This will be retained by the Butler framework and used in all calls to & from the object store. Typically these values are passed stored and passed around as clear text and/or set in environment variables.

These auth methods can be supported by the RepositoryArgs mechanism.

Storage

We will have to write a Storage class that is specialized for Swift. It will have the same API as the PosixStorage class.

Serialization with Streams

Currently AFW objects can not be serialized to or deserialized from streams. To send a serialized object to object store we will write the object to a temporary FITS file and then send that object to the object store. Similarly, reading a dataset from Swift will require that the whole FITS file be downloaded to a temporary FITS file and the AFW object will read the FITS file.

Question: Does the file have to remain so long as the object exists? (does the whole object get loaded into memory, or does it lazily read from the FITS file as needed?). If the object needs to remain we will have to determine how to know when to delete it.

Registry

For expediency we will require that the repository have an associated sqlite registry that will be downloaded when the Storage class is instantiated.

Options for expanding this in the future include:

  • putting the registry in a remote database and using it there (not downloading it in any way)
  • writing a registry subclass similar to the PosixRegistry that specialized for inspecting the contents of the object store

Mapper Refactoring

Currently CameraMapper uses the filesystem directly. It has been on my someday-to-do list to refactor it to properly use the Storage class for all filesystem i/o. That would have to be done for this to work. We would also have to inspect CameraMapper subclasses and fix any direct use of the file system.

Test Data

We can do very basic testing by adding and/or refactoring tests in daf_persistence, daf_butlerUtils, and obs_test working with the Swift Repository.

Fabio will find out what unit tests are being run by LSST stack users at IN2P3. Preference is to get those tests running so that he can evaluate the implementation with them.

Set Up a Swift Sandbox

Fabio will do the following (estimates he can start early September)

  • Set up a Swift sandbox workspace at NCSA for us to test with.
  • Send Nate examples of using Swift in Python to do:
    • connecting (authentication)
    • downloading an object (file)
    • uploading an object (file)
    • query if an object exists
  • [2016-09-02] Update
    • Fabio made an initial version in the form of a Python notebook. It is available on github: https://github.com/airnandez/butlerswift
    • It was tested with IN2P3's Swift installation.Tests with NCSA's Swift will be performed once Fabio gets Nebula credentials.
  • [2016-09-06] Update
    • Fabio tested successfully against NCSA's Nebula Swift instance.

  • No labels

4 Comments

  1. So it is interesting to try these things, Is Jason in the loop?  He is working on how these things fit into an overall system,  (e.g how does an Object store interoperate with a disaster recovery scheme) 

  2. Unknown User (npease)

    Unknown User (jalt), you & I have not discussed this yet. Are you interested, would you like to talk about it?

  3. Unknown User (jalt)

    Yes, I would like to discuss this further. Our understanding of the Data Backbone is expanding rapidly. At the very least, it will be necessary to keep these two components in sync and compatible. I could see a preliminary conversation during the infrastructure meeting and a public discussion on specifics (in written form) to follow. I'm open to whatever though.

  4. Unknown User (npease)

    That sounds reasonable. I'm going to be traveling on PTO during the next infrastructure (Sept 2) meeting but I think I'll be able to call in.