Multyvac for Python Primer¶
This document will show you how to get started with the multyvac Python library.
Creating a Job¶
The way to use Multyvac in Python is to designate a function that you want to run on Multyvac, instead of your own machine. Here we’ll walk through an example of offloading a simple function to Multyvac.
Open Python interactively and define the add function.
def add(x, y): return x+y
Normally, you would just run the function locally by calling it:
>>> add(1, 2) 3
If you want to run it on Multyvac, submit it using multyvac.submit():
>>> import multyvac >>> jid = multyvac.submit(add, 1, 2)
That’s it! You pass in arguments to add() by passing them as arguments in the same order to submit(). Keyword arguments are just as easy: :meth:multyvac.submit(add, x=1, y=2).
submit() is non-blocking; it returns immediately without waiting for add to actually run. To verify, try this:
>>> import time >>> time.sleep(10) # will sleep for 10 seconds >>> multyvac.submit(time.sleep, 10) # returns immediately
By returning immediately, multyvac.submit() can’t give you the result of your function. What it returns instead is an integer jid (Job IDentification).
>>> print jid 1
For the remainder of the Primer, let’s assume the jid is 1. We’ll show you what you can do with a jid.
Using the Job Id¶
Job identifiers are unique to your account. Your first job has jid 1, and it is incremented sequentially with each new job. All of Multyvac’s job-related facilities use jids. We’ll explore a few of them now.
Querying a Job’s Status¶
Below is a diagram of the possible statuses a job can have once it is created.
A job spends a variable amount of time in various steps before it is finished (shown as squares), at which point its status becomes permanent. Only then will its result, or reason for failure be available. The path of gray elements, queued -> processing -> done, is the most common. The full definition of statuses follows:
|waiting||Job is waiting until its dependencies are satisfied.|
|queued||Job is in the queue waiting for a free core.|
|processing||Job is running.|
|done||Job completed successfully.|
|error||Job errored (typically due to an uncaught exception).|
|killed||Job was aborted by the user.|
|stalled||Job will not run due to a dependency erroring.|
To query a job’s status in Python:
>>> job = multyvac.get(1) # gets job info >>> job.status # job is still running 'processing' >>> job.update() # grab latest job info >>> job.status # job has finished 'done'
Querying a Job’s Result¶
To get the result of the functions we ran earlier, use job.get_result():
>>> job.get_result() 3
Result calls will block until the job has finished, and therefore the result is ready.
Waiting for a Job¶
If you just want to wait for a job to finish, you can do:
You can wait for the job to start processing:
Viewing a Job in the Dashboard¶
On the Job Dashboard, the jobs you create are listed. The left most column has the job’s id, and the right most column its status. To see a detailed report for a job, just click on its jid.
SSH into a Job¶
You may want to SSH into the system that is running a job for debugging and inspection purposes.
>>> job.open_ssh_console() Welcome to Ubuntu 12.04.3 LTS (GNU/Linux 3.11.0-12-generic x86_64) multyvac@c:~$
As you can see, you’re dropped into a shell right from your Python terminal. Do note that the SSH session will be closed when the job is finished.
The job object has a lot of other attributes you may find useful:
|stdout||The standard output of the job. If the job is processing, this gives a live snapshot.|
|stderr||The standard error of the job. If the job is processing, this gives a live snapshot.|
|runtime||The number of seconds the job ran for. If the job is processing, this gives a live snapshot.|
|cmd||The shell command that was executed.|
|core||The core that was used.|
|multicore||The number of cores that were used.|
|status||The current status of the job|
|tags||A dict of all the key-values you’ve assigned the job.|
|created_at||When the job was created.|
|started_at||When the job started processing.|
|finished_at||When the job was finished.|
|queue_delay||The number of seconds the job spent in the queue.|
|overhead_delay||The number of seconds of overhead that Multyvac introduced.|
|cputime_user||The number of seconds of user CPU time used. If the job is processing, this gives a live snapshot.|
|cputime_system||The number of seconds of system CPU time used. If the job is processing, this gives a live snapshot.|
|memory_failcnt||The number of times memory allocations failed. If the job is processing, this gives a live snapshot.|
|memory_max_usage||The max amount of memory the job has used thus far. If the job is processing, this gives a live snapshot.|
|ports||The listening sockets that a job has opened mapped to (address, port) combinations that should be used to connect to them.|
Killing a Job¶
If a job is not behaving as expected, or no longer needs to be run for whatever reason, you can kill it. Killing terminates a processing or queued job immediately.
Giving a Job a Name¶
Jobs can be given names, which can make it easier to query them later on.
>>> multyvac.submit(add, 1, 2, _name='1+2') 345 # jid >>> multyvac.get_by_name('1+2') Job(345, name=u'1+2')
If multiple jobs share the same name, only the most recently created is returned.
There are certain functions, such as those that require compilation (C-extensions) that cannot be automatically transferred from your machine to Multyvac. For example, numpy:
>>> import numpy >>> jid = multyvac.submit(numpy.add, 1, 2) >>> job = multyvac.get(jid) >>> job.get_result() JobError: Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/local/lib/python2.7/dist-packages/multyvacinit/pybootstrap.py", line 8, in <module> f, args, kwargs = pickle.loads(stdin) File "/usr/local/lib/python2.7/dist-packages/multyvac/util/cloudpickle.py", line 961, in _getobject mod = __import__(modname) ImportError: ('No module named numpy', <function _getobject at 0x26b4de8>, ('numpy', 'add'))
To install numpy on Multyvac, use a Layer.
>>> multyvac.layer.create('numpy') >>> layer = multyvac.layer.get('numpy') >>> modify_job = layer.modify() >>> modify_job.open_ssh_console() multyvac@c:~$ sudo apt-get install numpy Reading package lists... Done Building dependency tree Reading state information... Done The following NEW packages will be installed: binutils cpp cpp-4.6 gcc gcc-4.6 libblas3gf libc-dev-bin libc6-dev libgfortran3 libgmp10 libgomp1 liblapack3gf libmpc2 libmpfr4 libquadmath0 linux-libc-dev manpages manpages-dev python-numpy 0 upgraded, 19 newly installed, 0 to remove and 0 not upgraded. Need to get 28.8 MB of archives. After this operation, 75.6 MB of additional disk space will be used. Do you want to continue [Y/n]? Y ... $ exit >>> modify_job.snapshot()
Now that we’ve snapshot-ed a new layer, we can run numpy.add as long as we specify the layer:
>>> jid = multyvac.submit(numpy.add, 1, 2, _layer='numpy') >>> job = multyvac.get(jid) >>> job.get_result() 3
To see what else you can do, please consult the Layer API.
Volumes: Data Storage¶
If you want your job to be able to access data on the filesystem, you’ll want to use a Volume. Volumes are folders that can be mounted at any path in the Multyvac filesystem.
Here we’ll create a volume called dataset and mount it at /data.
>>> multyvac.volume.create('dataset', '/data')
Now let’s put some data (hello, world) in the volume in a file called msg.
>>> vol = multyvac.volume.get('dataset') >>> vol.put_contents('hello, world', 'msg')
We can now run a job that can read this file:
>>> def dump_file() ... return open('/data/msg').read() >>> jid = multyvac.submit(dump_file) >>> job = multyvac.get(jid) >>> job.get_result() JobError: Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/local/lib/python2.7/dist-packages/multyvacinit/pybootstrap.py", line 10, in <module> res = f(*args, **kwargs) File "<ipython-input-4-c3c19ef9b651>", line 2, in dump_file IOError: [Errno 2] No such file or directory: '/data/msg'
We got this error because we didn’t specify that the job should use the volume. Let’s try again:
>>> jid = multyvac.submit(dump_file, _vol='dataset') >>> job = multyvac.get(jid) >>> job.get_result() 'hello, world'
Working with your Local Filesystem¶
Volumes give you the ability to synchronize a local path with Multyvac. We say “synchronize” because if you synchronize a path multiple times, our client will only send up what has changed.
>>> vol.sync_up('/path/to/big/data', 'stuff') # takes a while >>> vol.sync_up('/path/to/big/data', 'stuff') # takes seconds because nothing changed
Likewise, you can synchronize data from Multyvac to your local filesystem efficiently.
>>> vol.sync_down('stuff', '/path/on/your/machine')
>>> vol.put_file('local_path', 'remote_path') >>> vol.get_file('remote_path', 'local_path')
To see what else you can do, please consult the Volume API.