A Gen 3 butler is configured using YAML files. The main sections of the YAML file are handled separately by each sub configuration. Each config specialization, registry, schema, storageClass, composites, and dimensions knows the name of the key for its section of the configuration and knows the names of files providing overrides and defaults for the configuration.  Additionally, if the sub configuration contains a cls key, that class is imported and an additional configuration file name can be provided by asking the class for its defaultConfigFile  property.  All the keys within a sub configuration are processed by the class constructed from cls .

The primary source of default values comes from $DAF_BUTLER_DIR/config – this directory contains YAML files matching the names specified in the sub config classes and also can include names specified by the corresponding component class (for example PosixDatastore  specifies that configuration should be found in datastores/posixDatastore.yaml .

There are additional search paths that can be included when a config object is constructed:

  1. Explicit list of directory paths to search passed into the constructor.
  2. Paths defined using the environment path variable DAF_BUTLER_CONFIG_PATH .

To construct a Butler config object from a file the following happens:

  1. The supplied config is read in.
  2. Each sub configuration class is constructed by supplying the relevant subset of the global config to the component Config constructor.
  3. A search path is constructed by concatenating the supplied search path, the environment variable path, and the daf_butler config directory.
  4. Defaults are first read from the config class default file name (e.g., registry.yaml ) and merged in priority order given in the search path.
  5. Then any cls  -specific config files are read, overriding the current defaults.
  6. Finally, any child configurations are read as specified in cls.containerKey  (assumed to be a list of configurations compatible with the current config class).  This is to allow a, for example, ChainedDatastore  to be able to expand the child posixDatastore configurations using the same rules.
  7. Values specified in the butler.yaml always take priority since it is assumed that the values explicitly defined in the butler yaml file must be more important than values read from defaults.


We also have a YAML parser extension !include  that can be used to pull in other YAML files before the butler specific config parsing happens.  This is very useful to allow reuse of YAML snippets but be aware that the path specified is relative to the file that contains the directive.


Michelle Gower 's use case changes things a little.

For this they have a butler.yaml file but it can not be edited. Furthermore, to access the registry some user-specific overrides must be given in the registry.db  part of the configuration, overriding the value specified in the butler.yaml configuration.  There is no way in the current system to deal with this short of copying the butler.yaml  file (which would also result in any butlerRoot  directives being incorrect and having to be replaced).

One way to fix it is for the behavior of search paths to change such that explicit search paths given to the constructor, and environment variable paths can overwrite supplied configuration and are distinct from the DAF_BUTLER_DIR/config  configuration defaults which can not overwrite.




  • No labels

6 Comments

  1. The way it's meant to work is that you specify a search path (somehow) and put in one of those directories in the path a registry.$

    This means that:


    datastore: !include datastore.yaml

    registry: !include registry.yaml


    works fine but:


    !include butler.yaml

    registry:

      db: whatever


    does not work. The parser doesn't like it. Maybe that's a problem with my implementation of the include directive but it means tha$


    butlerConfigs:

    - butler1.yaml

    - butler2.yaml

    registry:

      db: mysql://...


    I would then read the include YAML, squash it all together in priority order of explicit, butler2, butler1, and then do all the me$


    Do you have any opinion on how search paths will influence things?

    Should I define the search paths such that the includes for butler1.yaml will search in the search paths if not an absolute path?

    First match wins? (current directory, then path).


  2. I had a brief chat with Nate Lust today on butler configuration, spurred by DM-19583 (which I have not followed closely, and neither of us want to hold up at all).  Anyhow, I think we both came to the conclusion that long-term we should probably try to restructure the configuration hierarchy such that the top-level breakdown is based on who owns the configuration and when it can be changed - at least software (i.e. owned by the daf_butler git repo), data repository (stuff fixed after the repo is created), client (fixed when you create a Butler or BPS class instance in Python), and user.  Software components like Registry and Datastores could get their own subsections within each of those as needed, but not necessarily with the same options - e.g. the DB to connect to would always live in repo-level configuration, credentials would only live in user configuration, and Formatters to write new files with would live in client configuration.  That's a long term project and I don't think we necessarily want to do it now, but I think what we fundamentally need to do is to restructure our concrete configuration option hierarchy to reduce our reliance on indirection tooling in configuration files.

  3. Does "always live in X-level configuration" mean that users can't override the value?

  4. I think he is saying that some parts of registry configuration can not be overridden but other parts can, and currently we have no boundary between the two. You can completely mess things up if you like by convincing a butler to talk to an entirely different registry.

  5. There needs to be clarification on the various DB settings that we should or should not allow a user to change.     For example, I'm using a different service than others to get the trace information for the admins.   In other work, I've had to try to connect to a specific machine in a db cluster to debug an issue (i.e., change the IP address).   I understand that most users will not need the ability to override most values (and can easily create problems for themselves and possible others), but we can't make it impossible or time prohibitive for core staff to do so.

  6. Putting together and agreeing on how to categorize all of the concrete settings is the reason this is will be hard.

    (For that particular case, I think it's probably more appropriate to define a derived/copied version of the repo config for administrators than to allow the DB connection string to be overridden in user configs, but that's the sort of thing we'll have to discuss.)