This page is a proposal for the acquisition and assignment of computers (some are Virtual Machines - VM) to address OPSIM-547 - Assess how to run showMaf.py as a tool for distributing OpSim results.  This issue stems from the need to have a central repository where analysis output for ALL simulation runs are collected for OpSim Team use, as well as a central repository where analysis output for a subset of these runs can be made available to the science community.  Currently, Mantis Run Log performs both functions for SSTAR output.

I am finding that the virtual machine "opsimcvs" is not well suited for distributing MAF output for the following reasons:

opsimcvs (now a Virtual Machine - VM - on ops2)

  • Simulator code depends on opsimcvs, because it writes to the trackingDB
  • Mantis Run Log - current distribution / collaboration for SSTAR output
  • holds archives of CVS for Simulator and SSTAR
  • runs on CentOS which is not ideal for MAF installation
  • current plan is to phase out this machine entirely along with SSTAR and CVS
  • ideally, a single machine would perform this central repository function.

Proposal for OpSim by function

A suite of 5 machines could perform the current OpSim functions - two existing machines (ops1 & ops2) one new machine to be acquired (ops3), and two virtual machines to be set up on existing machines.

  1. development for Simulations (TBD)

  2. production for Simulations (ops1 & ops2)
    • would not need MAF installed or maintained as only Simulator codebase would be maintained
    • this solves the problem of different python requirements for the Simulator and MAF
    • only run simulations on these machine (as usual, run from /lsst/opsim with all output in this dir, not /home/*)

  3. development for MAF (VM my laptop or VM on ops3, a new machine)
    •  8Gb VM on my laptop
    • can use VPN to access ops1 & ops2
    • can run MAF on ops1/ops2 DBs over VPN or local sqlite files to test code

  4. production for MAF (ops3, a new machine)
    • run production MAF on ops3 -> update new code and run or rerun MAF on ops1 or ops2 to create output files
    • move output files to ops4 (a VM)

  5. distribution (ops4 or opsdist - VM on DMZ machine)
    • LSST/NOAO DMZ machine is to come on line imminently
    • copy output files from within NOAO (MAF production machine ops3)
    • run showMAF.py on an accessible port (8080?)
    • specify runs for science community in one trackingDB
    • specify runs for OpSim use (most of them) in another trackingDB

Please feel free to add comments or questions and we can discuss at this week's (Oct 1) OpSim meeting.

  • No labels

8 Comments

  1. A couple of questions:

    • runs on CentOS which is not ideal for MAF installation

    I'm curious why you say CentOS is not ideal for MAF.  It is actually the distribution we use for continuous integration builds and testing.

    • this solves the problem of different python requirements for the Simulator and MAF

    I've never understood this.  What are the different python requirements?  How are the python requirements for the Simulator managed?  It's probably not an issue, but I'm curious whether it is systemic or if it is more of a convenience thing.

     

  2. Can you provide an assessment of the CPU requirements for running MAF and opsim over the next 12 months to go along with the breakdown by functionality. How many runs do we expect, how long will they take and how many GB of output will they generate. I'm trying to understand your assessment of how many machines we need and what flavor. For example are you thinking of ops3 as a web server with a lot of disk space, a CPU machine or something else.?

    • runs on CentOS which is not ideal for MAF installation

    I misspoke here - I thought running MAF was better using ubuntu or Fedora.  The other concern is that we are already running apache on opsimcvs as a webserver on port 80.  Wouldn't it be better to have a dedicated machine so that when the new machine was up and running (it would also be a virtual machine) then we can dismantle opsimcvs when MAF is fully integrated?

    • this solves the problem of different python requirements for the Simulator and MAF

    I ran into this when I installed MAF on ops2.  The Simulator is not stack-compatible - my understanding is that it was meant to be able to install on any platform to use native or otherwise installed tools.  Maybe Lynne has more information on this, but right now, I have to change PYTHONPATH to run MAF on ops2.  I would have to ask Francisco why the simulator won't run on Anaconda python.  Can MAF run on python already installed on ops2?

    • assessment of the CPU requirements for running MAF and opsim

    I will talk to Kem about estimating the number of simulations - that actually depends on a lot of factors.  

    Lynne will know more about how much CPU and disk space a typical standard set of MAF output will take.

    I was imagining that ops4 - the repository/distribution machine - would be a web-server with lots of disk space - it would not be running MAF or Simulations so it would not need lots of CPUs,  and it would have to be outside the NOAO firewall, hence the idea of creating a virtual machine in the DMZ (which is supposed to be realized soon).  Since we can imagine that upgrades/improvements/additions to MAF will likely demand reprocessing a standard set of possibly dozens of runs, in order to not conflict with simulator cpus (ops1 and ops2), a "MAF production" machine (ops3) would have to be it's own machine with large disk space for generating output for copying over to the distribution machine, and then deletion from ops3 or we might want to keep a large number of sets of outputs for comparisons and analysis to newer simulations.  Lastly, historically there has been a preference for separating production machines from development machines, so that means MAF development (that I am planning to do) should be separated from production codebases and resources.  Since this is not as disk nor CPU intensive as production or distribution, I have suggested a VM either on my personal laptop, or could also be a VM on ops3 to keep it separate.

    I purposely did not put a lot of text in the above proposal to keep it readable, so please let me know if you want more elaboration.

    1. Re CentOS: I don't know how the transition is going to happen, so I trust your intuition on that.

      Re Python versions: You should be able to install MAF using any Python v2.7 as long as you have the few prerequisites.  Maybe it's the prereqs, but they should all be pip installable.

      1. Simon, yes - you should be able to run MAF using any python v2.7 and then manage the prereqs. I think you're right that all of the requirements are pip installable. It was just much easier to tell Cathy to use the stack anaconda. 

        The reason that then was a problem, is that the opsim team are using a python 2.6 slalib installed in python 2.6 site packages, and it was hard coded into their pythonpath for some reason (rather than letting python pick up the appropriate packages). This slalib package conflicted with anaconda python for some reason I don't remember off-hand. 

        That said, I do think having a separate machine to host the publicly distributed runs and function as a repository (and have an outside the firewall connection) is a good idea in general.

  3. Outputs of the current SSTAR driver take about 750M per run. The sqlite db tables are about 4.3G uncompressed, gzipped they're about half the size (which is what I think we should probably count as storage/distribution requirement). The current 'cadence' driver outputs are about another 500M. All together, it looks like about ~3.5G per run.

    On my mac laptop (2.3 Ghz i7) with SSD, it takes about 20 minutes to run the current SSTAR driver per run. 

    On my (much older) work desktop (2.66 Ghz Intel Core2) with normal sort of hard drives, it takes 1 hour per run. 

    Note that MAF is simple python and as such, it is only running on one core.

  4. Note that running two versions of showMaf with two different trackingDBs (desireable because of the internal vs external different runs) on the same machine means that two ports will need to be used. 

    If port 80 isn't needed for some other service, I'd recommend putting the official released runs there so that we don't run into problems with a few people like Dave Monet who can't ever access 8080 or 8888 (even though this is fairly standard). Then putting the internal runs on 8080 or 8888. 

  5. I spoke to Iain about setting this up on a new machine, and he seems to think we can run both dbs off the same port.  Maybe we should set up a time to talk about this with him.