Skip to content

Commit 4449e3a

Browse files
committedSep 22, 2024
monitoring examples and screenshot
1 parent b73c5b0 commit 4449e3a

File tree

3 files changed

+92
-19
lines changed

3 files changed

+92
-19
lines changed
 

‎mon.py

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import parsl
2+
3+
def fresh_config():
4+
return parsl.Config(
5+
executors=[parsl.HighThroughputExecutor()],
6+
monitoring=parsl.MonitoringHub(hub_address = "localhost")
7+
)
8+
9+
@parsl.python_app
10+
def add(x: int, y: int) -> int:
11+
return x+y
12+
13+
@parsl.python_app
14+
def twice(x: int) -> int:
15+
return 2*x
16+
17+
with parsl.load(fresh_config()):
18+
print(twice(add(5,3)).result())

‎monitoring.rst

+74-19
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,91 @@
1-
.. index:: SQL, monitoring
1+
.. index:: SQL, monitoring, SQLite, SQLite
2+
library; sqlite3
23

34
Understanding the monitoring database
45
#####################################
56

67
Parsl can store information about workflow execution into an `SQLite database <https://www.sqlite.org/>`_. Then you can look at the information, in a few different ways.
78

89
.. index:: monitoring; configuration
10+
MonitoringHub
911

10-
turning on monitoring
12+
Turning on monitoring
1113
=====================
1214

13-
.. todo:: this section should show a simple configuration
15+
Here's the workflow used in `taskpath`, but with monitoring turned on:
1416

15-
how to look at information
16-
==========================
17+
.. code-block:: python
18+
:emphasize-lines: 6,14
19+
20+
import parsl
21+
22+
def fresh_config():
23+
return parsl.Config(
24+
executors=[parsl.HighThroughputExecutor()],
25+
monitoring=parsl.MonitoringHub(hub_address = "localhost")
26+
)
27+
28+
@parsl.python_app
29+
def add(x: int, y: int) -> int:
30+
return x+y
31+
32+
with parsl.load(fresh_config()):
33+
print(twice(add(5,3)).result())
34+
35+
Compared to the earlier version, the changes are adding ``monitoring=`` parameter to the Parsl configuration, and adding an additional app ``twice`` to make the workflow a bit more interesting.
36+
37+
After running this, you should see a new file, ``runinfo/monitoring.db``:
38+
39+
.. code-block::
40+
41+
$ ls runinfo/
42+
000
43+
monitoring.db
44+
45+
This new file is an SQLite database shared between all workflow runs that use the same ``runinfo/`` directory.
46+
47+
Using monitoring information
48+
============================
49+
50+
There are two main approaches to looking at the monitoring database: the prototype ``parsl-visualize`` tool, and Python data analysis.
1751

1852
.. index:: parsl-visualize
1953
monitoring; parsl-visualize
2054

2155
parsl-visualize web UI
2256
----------------------
2357

24-
Parsl comes with a prototype visualizer for the monitoring database.
58+
Parsl comes with a prototype browser-based visualizer for the monitoring database.
59+
60+
Start it like this, and then point your browser at the given URL.
61+
62+
.. code-block::
63+
64+
$ parsl-visualize
65+
* Serving Flask app 'parsl.monitoring.visualization.app'
66+
* Debug mode: off
67+
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
68+
* Running on http://127.0.0.1:8080
69+
Press CTRL+C to quit
70+
71+
72+
73+
Here's a screenshot, showing the above two-task workflow spending most of its 5 second run with the ``add`` task in ``launched`` state (waiting for a worker to be ready to run it), and the ``twice`` task in ``pending`` state (waiting for the ``add`` task to complete).
74+
75+
.. image:: monitoring_wf.png
76+
:width: 400
77+
:alt: browser screenshot with some workflow statistics and two coloured bars for task progress
2578

26-
Here's a screenshot:
79+
I'm not going to go further into ``parsl-visualize`` but you can run your own workflows and click around to explore.
2780

28-
.. todo:: this should be a couple of screenshot and not much else
81+
.. index:: pandas
82+
monitoring; pandas
83+
library; pandas
2984

30-
programmatic access
31-
-------------------
85+
Using data frames
86+
-----------------
3287

33-
I usually use SQL, but Parsl users are usually more familiar with data processing in Python: you can load the database tables into Pandas data frames and do data frame stuff there.
88+
A different approach preferred by many data-literate Parsl users is to treat monitoring data like any other Python data, using Pandas.
3489

3590
.. todo:: one example of non-plot (count tasks?)
3691

@@ -47,23 +102,23 @@ The monitoring database SQL schema is defined using SQLAlchemy's ORM model at:
47102

48103
https://github.com/Parsl/parsl/blob/3f2bf1865eea16cc44d6b7f8938a1ae1781c61fd/parsl/monitoring/db_manager.py#L132
49104

50-
.. warning:: and the schema is defined again at https://github.com/Parsl/parsl/blob/3f2bf1865eea16cc44d6b7f8938a1ae1781c61fd/parsl/monitoring/visualization/models.py#L12 -- see issue https://github.com/Parsl/parsl/issues/2266
105+
.. warning:: The schema is defined a second time in `parsl/monitoring/visualization/models.py line 12 onwards <https://github.com/Parsl/parsl/blob/3f2bf1865eea16cc44d6b7f8938a1ae1781c61fd/parsl/monitoring/visualization/models.py#L12>`_. See `issue #2266 <https://github.com/Parsl/parsl/issues/2266>`_ for more discussion.
51106

52107
These tables are defined:
53108

54109
.. todo:: the core task-related tables can get a hierarchical diagram workflow/task/try+state/resource
55110

56-
* workflow - each workflow run gets a row in this table. A workflow run is one call to ``parsl.load()`` with monitoring enabled, and everything that happens inside that initialized Parsl instance.
111+
* ``workflow`` - each workflow run gets a row in this table. A workflow run is one call to ``parsl.load()`` with monitoring enabled, and everything that happens inside that initialized Parsl instance.
57112

58-
* task - each task (so each invocation of a decorated app) gets a row in this table
113+
* ``task`` - each task (so each invocation of a decorated app) gets a row in this table
59114

60-
* try - if/when Parsl tries to execute a task, the try will get a row in this table. As mentioned in `elaborating`, there might not be any tries, or there might be many tries.
115+
* ``try`` - if/when Parsl tries to execute a task, the try will get a row in this table. As mentioned in `elaborating`, there might not be any tries, or there might be many tries.
61116

62-
* status - this records the changes of task status, which include changes known on the submit side (in ``TaskRecord``) and changes which are not otherwise known to the submit side: when a task starts and ends running on a worker. You'll see ``running`` and ``running_ended`` states in this table which will never appear in the ``TaskRecord``. One ``task`` row may have many ``status`` rows.
117+
* ``status`` - this records the changes of task status, which include changes known on the submit side (in ``TaskRecord``) and changes which are not otherwise known to the submit side: when a task starts and ends running on a worker. You'll see ``running`` and ``running_ended`` states in this table which will never appear in the ``TaskRecord``. One ``task`` row may have many ``status`` rows.
63118

64-
* resource - if Parsl resource monitoring is turned on (TODO: how?), a sub-mode of Parsl monitoring in general, then a resource monitor process will be placed alongside the task (see `elaborating`) which will report things like CPU time and memory usage periodically. Those reports will be stored in the resource table. So a try of a task may have many resource table rows.
119+
* ``resource`` - if Parsl resource monitoring is turned on (TODO: how?), a sub-mode of Parsl monitoring in general, then a resource monitor process will be placed alongside the task (see `elaborating`) which will report things like CPU time and memory usage periodically. Those reports will be stored in the resource table. So a try of a task may have many resource table rows.
65120

66-
* block - when the scaling code starts or ends a block, or asks for status of a block, it stores any changes into this table. If enough monitoring is turned on, the block where a try runs will be stored in the relevant ``try`` table row.
121+
* ``block`` - when the scaling code starts or ends a block, or asks for status of a block, it stores any changes into this table. If enough monitoring is turned on, the block where a try runs will be stored in the relevant ``try`` table row.
67122

68-
* node - this one is populated with information about connected worker pools with htex (and not at all with other executors), populated by the interchange when a pool registers or when it changes status (disconnects, is set to holding, etc)
123+
* ``node`` - this one is populated with information about connected worker pools with htex (and not at all with other executors), populated by the interchange when a pool registers or when it changes status (disconnects, is set to holding, etc)
69124

‎monitoring_wf.png

97.5 KB
Loading

0 commit comments

Comments
 (0)
Please sign in to comment.