-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy pathmpepool.py
executable file
·2963 lines (2698 loc) · 131 KB
/
mpepool.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
:Description: Multi-Process Execution Pool to schedule Jobs execution with per-job timeout,
optionally grouping them into Tasks and specifying optional execution parameters
considering NUMA architecture:
- automatic rescheduling and *load balancing* (reduction) of the worker processes
and on low memory condition for the *in-RAM computations* (requires
[psutil](https://pypi.python.org/pypi/psutil), can be disabled)
- *chained termination* of the related worker processes (started jobs) and
non-started jobs rescheduling to satisfy *timeout* and *memory limit* constraints
- automatic CPU affinity management and maximization of the dedicated CPU cache
vs parallelization for a worker process
- *timeout per each Job* (it was the main initial motivation to implement this
module, because this feature is not provided by any Python implementation out of the box)
- onstart/ondone *callbacks*, ondone is called only on successful completion
(not termination) for both Jobs and Tasks (group of jobs)
- stdout/err output, which can be redirected to any custom file or PIPE
- custom parameters for each Job and respective owner Task besides the name/id
Flexible API provides optional automatic restart of jobs on timeout, access to job's process,
parent task, start and stop execution time and much more...
Core parameters specified as global variables:
_LIMIT_WORKERS_RAM - limit the amount of memory consumption (<= RAM) by worker processes,
requires psutil import
_CHAINED_CONSTRAINTS - terminate related jobs on terminating any job by the execution
constraints (timeout or RAM limit)
The load balancing is enabled when global variables _LIMIT_WORKERS_RAM and _CHAINED_CONSTRAINTS
are set, jobs categories and relative size (if known) specified. The balancing is performed
to use as much RAM and CPU resources as possible performing in-RAM computations and meeting
timeout, memory limit and CPU cache (processes affinity) constraints.
Large executing jobs are rescheduled for the later execution with less number of worker
processes after the completion of smaller jobs. The number of workers is reduced automatically
(balanced) on the jobs queue processing. It is recommended to add jobs in the order of the
increasing memory/time complexity if possible to reduce the number of worker process
terminations for the jobs execution postponing on rescheduling.
:Authors: (c) Artem Lutov <[email protected]>
:Organizations: eXascale Infolab <http://exascale.info/>, Lumais <http://www.lumais.com/>,
ScienceWise <http://sciencewise.info/>
:Date: 2015-07 v1, 2017-06 v2
"""
# Possible naming: pyexpool / mpepool
from __future__ import print_function, division # Required for stderr output, must be the first import
import sys
import os
import time
# import ctypes # Required for the multiprocessing Value definition
import types # Required for instance methods definition
import traceback # stacktrace
# To print a stacktrace fragment:
# traceback.print_stack(limit=5, file=sys.stderr) or
# print(traceback.format_exc(5), file=sys.stderr)
import subprocess
import errno
# # Async Tasks management
# import threading # Used only for the concurrent Tasks termination by timeout
# import signal # Required for the correct handling of KeyboardInterrupt: https://docs.python.org/2/library/thread.html
import itertools # chain
from multiprocessing import cpu_count, Lock #, Queue #, active_children, Value, Process
from collections import deque
from math import sqrt
# Consider time interface compatibility with Python before v3.3
if not hasattr(time, 'perf_counter'):
time.perf_counter = time.time
# Required to efficiently traverse items of dictionaries in both Python 2 and 3
try:
from future.utils import viewvalues, viewitems #, viewkeys, viewvalues # External package: pip install future
from future.builtins import range #, list
except ImportError:
def viewMethod(obj, method):
"""Fetch view method of the object
obj - the object to be processed
method - name of the target method, str
return target method or AttributeError
>>> callable(viewMethod(dict(), 'items'))
True
"""
viewmeth = 'view' + method
ometh = getattr(obj, viewmeth, None)
if not ometh:
ometh = getattr(obj, method)
return ometh
viewitems = lambda dct: viewMethod(dct, 'items')()
#viewkeys = lambda dct: viewMethod(dct, 'keys')()
viewvalues = lambda dct: viewMethod(dct, 'values')()
# Replace range() implementation for Python2
try:
range = xrange
except NameError:
pass # xrange is not defined in Python3, which is fine
# Optional Web User Interface
_WEBUI = True
__imperr = None # Import error
if _WEBUI:
try:
# ATTENTION: Python3 newer treats imports as relative and results in error here if mpewui is a local module
from mpewui import WebUiApp, UiCmdId, UiResOpt, UiResCol, SummaryBrief
except ImportError as wuerr:
try:
# Note: this case should be the second because explicit relative imports cause various errors
# under Python2 and Python3, which complicates their handling
from .mpewui import WebUiApp, UiCmdId, UiResOpt, UiResCol, SummaryBrief
except ImportError as wuerr:
__imperr = wuerr # Note: exceptions are local in Python 3
_WEBUI = False
# Limit the amount of memory consumption by worker processes.
# NOTE:
# - requires import of psutils
# - automatically reduced to the RAM size if the specified limit is larger
_LIMIT_WORKERS_RAM = True
if _LIMIT_WORKERS_RAM:
try:
import psutil
except ImportError as lwerr:
__imperr = lwerr # Note: exceptions are local in Python 3
_LIMIT_WORKERS_RAM = False
def timeheader(timestamp=time.gmtime()):
"""Timestamp header string
timestamp - timestamp
return - timestamp string for the file header
"""
assert isinstance(timestamp, time.struct_time), 'Unexpected type of timestamp'
# ATTENTION: MPE pool timestamp [prefix] intentionally differs a bit from the
# benchmark timestamp to easily find/filter each of them
return time.strftime('# ----- %Y-%m-%d %H:%M:%S ' + '-'*30, timestamp)
# Note: import occurs before the execution of the main application, so show
# the timestamp to outline when the error occurred and separate re-executions
if not (_WEBUI and _LIMIT_WORKERS_RAM):
print(timeheader(), file=sys.stderr)
if not _WEBUI:
print('WARNING, Web UI is disabled because the "bottle" module import failed: '
, __imperr, file=sys.stderr)
if not _LIMIT_WORKERS_RAM:
print('WARNING, RAM constraints are disabled because the "psutil" module import failed: '
, __imperr, file=sys.stderr)
# Use chained constraints (timeout and memory limitation) in jobs to terminate
# also related worker processes and/or reschedule jobs, which have the same
# category and heavier than the origin violating the constraints
_CHAINED_CONSTRAINTS = True
_RAM_SIZE = os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES') / 1024.**3 # RAM (physical memory) size in GB
# Dedicate at least 256 MB for the OS consuming not more than 98% of RAM
_RAM_LIMIT = _RAM_SIZE * 0.98 - 0.25 # Maximal consumption of RAM in GB (< _RAM_SIZE to avoid/reduce swapping)
# System app to set CPU affinity if required, should be preliminary installed
# (taskset is present by default on NIX systems)
_AFFINITYBIN = 'taskset'
_DEBUG_TRACE = False # Trace start / stop and other events to stderr; 1 - brief, 2 - detailed, 3 - in-cycles
def secondsToHms(seconds):
"""Convert seconds to hours, mins, secs
seconds - seconds to be converted, >= 0
return hours, mins, secs
"""
assert seconds >= 0, 'seconds validation failed'
hours = int(seconds // 3600)
mins = int((seconds - hours * 3600) // 60)
secs = seconds - hours * 3600 - mins * 60
return hours, mins, secs
def inGigabytes(nbytes):
"""Convert bytes to gigabytes"""
return nbytes / (1024. ** 3)
def inBytes(gb):
"""Convert bytes to gigabytes"""
return gb * 1024. ** 3
def tblfmt(v, strpad=0):
"""Table-like formatting of the value
strpad: int - string padding
"""
if isinstance(v, float):
return '{:.3f}'.format(v)
elif isinstance(v, int):
return str(strpad).join(('{:', '}')).format(v)
if v is None:
v = '-'
elif not isinstance(v, str):
v = str(v)
return v.rjust(strpad)
def applyCallback(callback, owner):
"""Process the callback call
Args:
callback: function - callback (self.onXXX)
owner: str - owner name of the callback (self.name), required only for tracing
"""
#assert callable(callback) and isinstance(owner, str), 'A valid callback and owner name are expected'
try:
callback()
except Exception as err: #pylint: disable=W0703
print('ERROR in {}() callback of the "{}": {}, discarded. {}'
.format(callback.__name__, owner, err, traceback.format_exc(5)), file=sys.stderr)
# NOTE: additional parameter(s) can be added to output additional virtual properties like duration,
# which can be implemented as bool parameter to emulate properties specified by propflt
def infodata(obj, propflt=None, objflt=None):
"""Convert the object to the tuple filtering specified properties and itself
Args:
obj: XobjInfo - the object to be filtered and converted to the tuple, supposed
to be decorated with `propslist`
propflt: tuple(prop: str) - property filter to include only the specified properties
objflt: dict(prop: str, val: UiResFilterVal) - include the item
only if the specified properties belong to the specified range,
member items are processed irrespectively of the item inclusion
NOTE: property names should exactly match the obj properties (including __slots__)
Returns:
tuple - filtered properties of the task or None
Raises:
AttributeError - propflt item does not belong to the JobInfo slots
"""
assert hasattr(obj, '__slots__'), 'The object should contain slots'
# Pass the objflt or return None
if objflt:
for prop, pcon in viewitems(objflt): # Property name and constraint
# print('>>> prop: {}, cpon: {}; objname: {}, objpv: {}'.format(prop, pcon, obj.name, obj.__getattribute__(prop)))
#assert isinstance(prop, str) and isinstance(pcon, UiResFilterVal
# ), 'Invalid type of arguments: prop: {}, pflt: {}'.format(
# type(prop).__name__, type(pcon).__name__)
pval = None if prop not in obj else obj.__getattribute__(prop)
# Use task name for the task property
# Note: task resolution is required for the proper filtering
if isinstance(pval, Task):
pval = pval.name
if _DEBUG_TRACE and pval is None:
print(' WARNING, objflt item does not belong to the {}: {}'.format(
type(obj).__name__, prop), file=sys.stderr)
# Note: pcon is None requires non-None pval
if (pcon is None and pval is None) or (pcon is not None and (
(not pcon.opt and prop not in obj) or (pcon.end is None and pval != pcon.beg)
or pcon.end is not None and (pval < pcon.beg or pval >= pcon.end))):
# print('>>> ret None, 1:', prop not in obj, '2:', pcon.end is None and pval != pcon.beg
# , '3:', pcon.end is not None, '4:', pval < pcon.beg or pval >= pcon.end)
return None
return tuple(tblfmt(obj.__getattribute__(prop)) for prop in (propflt if propflt else obj.iterprop())) #pylint: disable=C0325
def infoheader(objprops, propflt):
"""Form filtered header of the properties of the ObjInfo
Args:
objprops: iterable(str) - object properties
propflt: list(str) - properties filter
Returns:
tuple(str) - filtered property names
"""
if propflt:
opr = set(objprops)
return tuple(h for h in propflt if h in opr)
return tuple(h for h in objprops)
# return tuple(h for h in objprops if not propflt or h in propflt)
def propslist(cls):
"""Extends the class with properties listing capabilities
ATTENTION: slots are listed in the order of declaration (to control the output order),
computed properties are listed afterwards in the alphabetical order.
Extensions:
- _props: set - all public attributes of the class
and all __slots__ members even if the latter are underscored
- `in` operator support added to test membership in the _props
- json - dict representation for the JSON serialization
- iterprop() method to iterate over the properties present in _props starting
from __slots__ in the order of declaration and then over the computed properties
in the alphabetical order
"""
def contains(self, prop):
"""Whether the specified property is present"""
assert len(self._props) >= 2, 'At least 2 properties are expected'
return self._props.endswith(' ' + prop) or self._props.find(prop + ' ') != -1 #pylint: disable=W0212
def json(self):
"""Serialize self to the JSON representation"""
return {p: self.__getattribute__(p) if p != 'task' else self.__getattribute__(p).name for p in self.iterprop()}
def iterprop(cls):
"""Properties generator/iterator"""
ib = 0
ie = cls._props.find(' ') #pylint: disable=W0212
while ie != -1:
yield cls._props[ib:ie] #pylint: disable=W0212
ib = ie + 1
ie = cls._props.find(' ', ib) #pylint: disable=W0212
yield cls._props[ib:] #pylint: disable=W0212
# List all public properties in the _props:str attribute retaining the order of __slots__
# and then defined computed properties.
# Note: double underscored attribute can't be defined externally
# since internally it is interpreted as _<ClsName>_<attribute>.
#assert hasattr(cls, '__slots__'), 'The class should have slots: ' + cls.__name__
cslots = set(cls.__slots__)
cls._props = ' '.join(itertools.chain(cls.__slots__, #pylint: disable=W0212
[m for m in dir(cls) if not m.startswith('_') and m not in cslots])) # Note: dir() list also slots
# Define required methods
# ATTENTION: the methods are bound automatically to self (but not to the cls in Python2)
# since they are defined before the class is created.
cls.__contains__ = contains
cls.json = json
cls.iterprop = types.MethodType(iterprop, cls) # Note: required only in Python2 for the static methods
return cls
@propslist
class JobInfo(object):
"""Job information to be reported by the request
ATTENTION: the class should not contain any public methods except the properties
otherwise _props should be computed differently
# Check `_props` definition
>>> JobInfo(Job('tjob'))._props
'name pid code tstart tstop memsize memkind task category duration'
# Check `contains` binding
>>> 'pid' in JobInfo(Job('tjob'))
True
>>> 'pi' in JobInfo(Job('tjob'))
False
# Check `iterprop` binding
>>> JobInfo(Job('tjob')).iterprop().next() == 'name'
True
# Check `iterprop` execution (generated iterator)
>>> ' '.join(p for p in JobInfo(Job('tjob')).iterprop()) == JobInfo(Job('tjob'))._props
True
"""
__slots__ = ('name', 'pid', 'code', 'tstart', 'tstop', 'memsize', 'memkind', 'task', 'category')
def __init__(self, job, tstop=None):
"""JobInfo initialization
Args:
job: Job - a job from which the info is fetched
tstop: float - job termination time, actual for the terminating deferred jobs
"""
assert isinstance(job, Job), 'Unexpected type of the job: ' + type(job).__name__
self.name = job.name
self.pid = None if not job.proc else job.proc.pid
self.code = None if not job.proc else job.proc.returncode
self.tstart = job.tstart
self.tstop = job.tstop if job.tstop is not None else tstop
# ATTENTION; JobInfo definitions should be synchronized with Job
# Note: non-initialized slots are still listed among the attributes but yield `AttributeError` on access,
# so they always should be initialized at least with None to sync headers with the content
if _LIMIT_WORKERS_RAM:
self.memsize = job.mem
self.memkind = job.memkind
else:
self.memsize = None
self.memkind = None
self.task = job.task
self.category = None if not _CHAINED_CONSTRAINTS else job.category
# rsrtonto # Restart on timeout
# if _LIMIT_WORKERS_RAM:
# wkslim #wksmax
@property
def duration(self):
"""Execution duration"""
if self.tstop is not None:
tlast = self.tstop
else:
tlast = time.perf_counter()
return None if self.tstart is None else tlast - self.tstart
@propslist
class TaskInfo(object):
"""Task information to be reported by the request
ATTENTION: the class should not contain any public methods except the properties
otherwise _props should be computed differently
"""
__slots__ = ('name', 'tstart', 'tstop', 'numadded', 'numdone', 'numterm', 'task')
def __init__(self, task):
"""TaskInfo initialization
Args:
task: Task - a task from which the info is fetched
"""
assert isinstance(task, Task), 'Unexpected type of the task: ' + type(task).__name__
self.name = task.name
self.tstart = task.tstart
self.tstop = task.tstop
self.numadded = task.numadded
self.numdone = task.numdone
self.numterm = task.numterm
self.task = task.task # Owner (super) task
@property
def duration(self):
"""Execution duration"""
return None if self.tstart is None else (self.tstop if self.tstop is not None
else time.perf_counter()) - self.tstart
class TaskInfoExt(object):
"""Task information extended with member items (subtasks/jobs)"""
__slots__ = ('props', 'jobs', 'subtasks')
def __init__(self, props, jobs=None, subtasks=None):
"""Initialization of the extended task information
Args:
props: tuple(header: iterable, values: iterable) - task properties
jobs: list(header: iterable, values1: iterable, values: iterable...) - member jobs properties
subtasks: list(TaskInfoExt) - member tasks (subtasks) info
"""
self.props = props
self.jobs = jobs
self.subtasks = subtasks
def tasksInfoExt(tinfe0, propflt=None, objflt=None):
"""Form hierarchy of the extended information about the tasks
Args:
tinfe0: dict(Task, TaskInfoExt) - bottom level of the hierarchy (tasks having jobs)
propflt: list(str) - properties filter
objflt: dict(str, UiResFilterVal) - objects (tasks/jobs) filter
Returns:
dict(Task, TaskInfoExt) - resulting hierarchy of TaskInfoExt
"""
ties = dict() # dict(Task, TaskInfoExt) - Tasks Info hierarchy
for task, tie in viewitems(tinfe0):
# print('> Preparing for the output task {} (super-task: {}), tie {} jobs and {} subtasks'.format(
# task.name, '-' if not task.task else task.task.name, 0 if not tie.jobs else len(tie.jobs)
# , 0 if not tie.subtasks else len(tie.subtasks))
# , file=sys.stderr if _DEBUG_TRACE else sys.stdout)
if task.task is None:
# Add or extend with jobs a root task
# Note: the task can be already initialized if it has subtasks
tinfe = ties.get(task)
if tinfe is None:
ties[task] = tie
else:
tinfe.jobs = tie.jobs
assert tinfe.props[1] == tie.props[1], task.name + ' task properties desynchronized'
else:
# print('>> task {} (super-task: {})'.format(task.name, task.task.name), file=sys.stderr)
while task.task is not None:
task = task.task
newtie = ties.get(task)
if newtie is None:
# Add new super-task to the hierarchy
# Note: infodata() should not yield None here
newtie = tinfe0.get(task)
# It is possible that the super-task has no any [failed] jobs
# and, hence, is not present in tinfe0
if newtie:
assert newtie.subtasks is None, (
'New super-task {} should not have any subtasks yet'.format(task.name))
newtie.subtasks = [tie]
else:
tdata = infodata(TaskInfo(task), propflt, objflt)
newtie = TaskInfoExt(props=None if not tdata else
(infoheader(TaskInfo.iterprop(), propflt), tdata) #pylint: disable=E1101
, subtasks=[tie])
ties[task] = newtie
# print('>> New subtask {} added to {}'.format(tie.props[1][0], task.name)
# , file=sys.stderr if _DEBUG_TRACE else sys.stdout)
else:
# Note: subtasks can be None if this task contains jobs
if newtie.subtasks is None:
newtie.subtasks = [tie]
else:
newtie.subtasks.append(tie) # Omit the header
# print('>> Subtask {} added to {}: {}'.format(tie.props[1][0], task.name, tie.subtasks)
# , file=sys.stderr if _DEBUG_TRACE else sys.stdout)
tie = newtie
return ties
def printDepthFirst(tinfext, cindent='', indstep=' ', colsep=' '):
"""Print TaskInfoExt hierarchy using the depth first traversing
Args:
tinfext: TaskInfoExt - extended task info to be unfolded and printed
cindent: str - current indent for the output hierarchy formatting
indstep: str - indent step for each subsequent level of the hierarchy
colsep: str - column separator for the printing variables (columns)
"""
strpad = 9 # Padding of the string cells
# Print task properties (properties header and values)
for props in tinfext.props:
print(cindent, colsep.join(tblfmt(v, strpad) for v in props), sep=''
, file=sys.stderr if _DEBUG_TRACE else sys.stdout)
# assert isinstance(tinfext, TaskInfoExt), 'Unexpected type of tinfext: ' + type(tinfext).__name__
# Print task jobs and subtasks
cindent += indstep
if tinfext.jobs: # Consider None
for tie in tinfext.jobs:
print(cindent, colsep.join(tblfmt(v, strpad) for v in tie), sep=''
, file=sys.stderr if _DEBUG_TRACE else sys.stdout)
# print('>> Outputting task {} with {} subtasks'.format(tinfext.props[1][0]
# , 0 if not tinfext.subtasks else len(tinfext.subtasks)), file=sys.stderr)
if tinfext.subtasks: # Consider None
for tie in tinfext.subtasks:
printDepthFirst(tie, cindent=cindent, indstep=indstep, colsep=colsep)
class TaskInfoPrefmt(object):
"""Preformatted Task info"""
__slots__ = ('compound', 'ident', 'data')
def __init__(self, data, ident=0, compound=None):
"""Initialization of the pre-formated task info:
data: list - data to be displayed (property name/value)
ident: uint - current indentation
compound: bool - whether the item is a header of the compound item (task),
None means that the item is not a header at all
"""
# header: bool - whether the vals represent a header or a payload data
self.compound = compound
self.ident = ident
self.data = data
def json(self):
"""Serialize self to the JSON representation"""
return {p: self.__getattribute__(p) for p in self.__slots__}
def unfoldDepthFirst(tinfext, indent=0):
"""Print TaskInfoExt hierarchy using the depth first traversing
Args:
tinfext: TaskInfoExt - extended task info to be unfolded and printed
indent: int - current indent for the output hierarchy formatting
return - list(hdr, indent, list(vals)), maxindent:
hdr: bool - whether the row is a header
indent: uint - current indentation
vals - outputting values
wide: uint - [max] output wide in items/cols considering the indentation
"""
wide = indent
res = []
# Print task properties (properties header and values)
# Format task's job properties header
if tinfext.props:
assert len(tinfext.props) == 2, (
'Task properties should contain the header and value rows: ' + str(len(tinfext.props)))
res.append(TaskInfoPrefmt(compound=True, ident=indent, data=tinfext.props[0]))
wide = indent + len(res[-1].data)
res.append(TaskInfoPrefmt(compound=None, ident=indent, data=[tblfmt(v) for v in tinfext.props[1]]))
# assert isinstance(tinfext, TaskInfoExt), 'Unexpected type of tinfext: ' + type(tinfext).__name__
# Format task's jobs' properties
indent += 1
if tinfext.jobs: # Consider None
res.append(TaskInfoPrefmt(compound=False, ident=indent, data=[tblfmt(v) for v in tinfext.jobs[0]]))
wide = indent + len(res[-1].data)
for tie in tinfext.jobs[1:]:
res.append(TaskInfoPrefmt(compound=None, ident=indent, data=[tblfmt(v) for v in tie]))
# Unfold subtasks
# print('>> Outputting task {} with {} subtasks'.format(tinfext.props[1][0]
# , 0 if not tinfext.subtasks else len(tinfext.subtasks)), file=sys.stderr)
if tinfext.subtasks: # Consider None
for tie in tinfext.subtasks:
sres, wide = unfoldDepthFirst(tie, indent=indent)
res.extend(sres)
return res, wide
class Task(object):
"""Task is a managing container for subtasks and Jobs"""
# _tasks = []
# _taskManager = None
# _taskManagerLock = threading.Lock()
#
# @staticmethod
# def _taskTerminator(*args, **kwargs):
# tasks = kwargs['tasks']
# latency = kwargs['latency']
# lock = kwargs['lock']
# ctime = time.perf_counter()
# with lock:
# for task in tasks:
# if task.timeout and ctime - task.tstart >= task.timeout:
# task.terminate()
def __init__(self, name, timeout=0, onstart=None, ondone=None, onfinish=None, params=None
, task=None, latency=1.5, stdout=sys.stdout, stderr=sys.stderr):
"""Initialize task, which is a group of subtasks including jobs to be executed
Note: the task is considered to be failed if at least one subtask / job is failed
(terminated or completed with non-zero return code).
name: str - task name
timeout - execution timeout in seconds. Default: 0, means infinity. ATTENTION: not implemented
onstart - a callback, which is executed on the task start (before the subtasks/jobs execution
started) in the CONTEXT OF THE CALLER (main process) with the single argument,
the task. Default: None
ATTENTION: must be lightweight
ondone - a callback, which is executed on the SUCCESSFUL completion of the task in the
CONTEXT OF THE CALLER (main process) with the single argument, the task. Default: None
ATTENTION: must be lightweight
onfinish - a callback, which is executed on either completion or termination of the task in the
CONTEXT OF THE CALLER (main process) with the single argument, the task. Default: None
ATTENTION: must be lightweight
params - additional parameters to be used in callbacks
task: Task - optional owner super-task
latency: float - lock timeout in seconds: None means infinite,
<= 0 means non-bocking, > 0 is the actual timeout
stdout - None or file name or PIPE for the buffered output to be APPENDED
stderr - None or file name or PIPE or STDOUT for the unbuffered error output to be APPENDED
ATTENTION: PIPE is a buffer in RAM, so do not use it if the output data is huge or unlimited
Automatically initialized and updated properties:
tstart - start time is filled automatically on the execution start (before onstart). Default: None
tstop - termination / completion time after ondone.
numadded: uint - the number of direct added subtasks
numdone: uint - the number of completed DIRECT subtasks
(each subtask may contain multiple jobs or sub-sub-tasks)
numterm: uint - the number of terminated direct subtasks (including jobs) that are not restarting
numdone + numterm <= numadded
"""
assert isinstance(name, str) and (latency is None or latency >= 0) and (
task is None or (isinstance(task, Task) and task != self)), (
'Task arguments are invalid, name: {}, latency: {}, task type: {} (valid: {})'
.format(name, latency, type(task).__name__, task != self))
self._lock = Lock() # Lock for the included jobs
# dict(subtask: Task | Job, accterms: uint)
# # Dictionary of non-completed (but can be terminated) subtasks with the direct termination counter
# self._items = dict()
# Set of non-finished (and possibly restarting) subtasks
self._items = set()
self.name = name
# Add member handlers if required
# types.MethodType binds the callback to the object
self.onstart = None if not callable(onstart) else types.MethodType(onstart, self)
self.ondone = None if not callable(ondone) else types.MethodType(ondone, self)
self.onfinish = None if not callable(onfinish) else types.MethodType(onfinish, self)
# self.timeout = timeout
self.params = params
self._latency = latency
self.task = task
self.stdout = stdout
self.stderr = stderr
# Automatically initialized attributes
self.tstart = None
self.tstop = None # Termination / completion time after ondone
self.numadded = 0 # The number of added direct subtasks, the same subtask/job can be re-added several times
self.numdone = 0 # The number of completed direct subtasks
self.numterm = 0 # Total number of terminated direct subtasks that are not restarting
# Update the task if any with this subtask
if self.task:
self.task.add(self)
# Consider subtasks termination by timeout
# if self.timeout:
# if not _tasks:
# _tasks.append(self)
# Task._taskManager = threading.Thread(name="TaskManager", target=Task._taskTerminator
# , kwargs={'tasks': Task._tasks, 'latency': latency})
def __str__(self):
"""A string representation, which is the .name if defined"""
return self.name if self.name is not None else self.__repr__()
def add(self, subtask):
"""Add one more subtask to the task
Args:
subtask: Job|Task - subtask of the current task
Raises:
RuntimeError - lock acquisition failed
"""
assert isinstance(subtask, Job) or isinstance(subtask, Task), 'Unexpected type of the subtask'
if self.tstart is None:
self.tstart = time.perf_counter()
# Consider onstart callback
if self.onstart:
applyCallback(self.onstart, self.name)
# Consider super-task
if self.task:
self.task.add(self)
elif subtask in self._items: # Omit calls from the non-first subsubtask of the subtask
return
if self._lock.acquire(timeout=self._latency):
self._items.add(subtask)
self.numadded += 1
self._lock.release()
else:
raise RuntimeError('Lock acquisition failed on add() in "{}"'.format(self.name))
def finished(self, subtask, succeed):
"""Complete subtask
Args:
subtask: Job | Task - finished subtask
succeed: bool - graceful successful completion or termination
Raises:
RuntimeError - lock acquisition failed
"""
if not self._lock.acquire(timeout=self._latency):
raise RuntimeError('Lock acquisition failed in the task "{}" finished'.format(self.name))
try:
self._items.remove(subtask)
if succeed:
self.numdone += 1
else:
self.numterm += 1
except KeyError as err:
print('ERROR in "{}" succeed: {}, the finishing "{}" should be among the active subtasks: {}. {}'
.format(self.name, succeed, subtask, err, traceback.format_exc(5), file=sys.stderr))
finally:
self._lock.release()
# Consider onfinish callback
if self.numdone + self.numterm == self.numadded:
assert not self._items, 'All subtasks should be already finished; remained {} items: {}'.format(
len(self._items), ', '.join(st.name for st in self._items))
self.tstop = time.perf_counter()
if self.numdone == self.numadded and self.ondone:
applyCallback(self.ondone, self.name)
if self.onfinish:
applyCallback(self.onfinish, self.name)
# Consider super-task
if self.task:
self.task.finished(self, self.numdone == self.numadded)
def uncompleted(self, recursive=False, header=False, pid=False, tstart=False
, tstop=False, duration=False, memory=False):
"""Fetch names of the uncompleted tasks
Args:
recursive (bool, optional): Defaults to False. Fetch uncompleted subtasks recursively.
header (bool, optional): Defaults to False. Include header for the displaying attributes.
pid (bool, optional): Defaults to False. Show process id of for the job, None for the task.
tstart (bool, optional): Defaults to False. Show tstart of the execution.
tstop (bool, optional): Defaults to False. Show tstop of the execution.
duration (bool, optional): Defaults to False. Show duration of the execution.
memory (bool, optional): Defaults to False. Show memory consumption the job, None for the task.
Note: the value as specified by the Job.mem, which is not the peak RSS.
Returns:
hierarchical dictionary of the uncompleted task names and other attributes, each items is tuple or str
Raises:
RuntimeError: lock acquisition failed
"""
extinfo = pid or duration or memory
if duration:
ctime = time.perf_counter() # Current time
# Form the Header if required
res = []
if header:
if extinfo:
hdritems = ['Name']
if pid:
hdritems.append('PID')
if tstart:
hdritems.append('Tstart')
if tstop:
hdritems.append('Tstop')
if duration:
hdritems.append('Duration')
if memory:
hdritems.append('Memory')
res.append(hdritems)
else:
res.append('Name')
def subtaskInfo(subtask):
"""Subtask tracing
Args:
subtask (Task | Job): subtask to be traced
Returns:
str | list: subtask information
"""
isjob = isinstance(subtask, Job)
assert isjob or isinstance(subtask, Task), 'Unexpected type of the subtask: ' + type(subtask).__name__
if extinfo:
res = [subtask.name]
if pid:
res.append(None if not isjob or subtask.proc is None else subtask.proc.pid)
if tstart:
res.append(subtask.tstart)
if tstop:
res.append(subtask.tstop)
if duration:
res.append(None if subtask.tstart is None else ctime - subtask.tstart)
if memory:
res.append(None if not isjob else subtask.mem)
return res
return subtask.name
if not self._lock.acquire(timeout=self._latency):
raise RuntimeError('Lock acquisition failed on task uncompleted() in "{}"'.format(self.name))
try:
# List should be generated on place while all the tasks are present
# Note: list extension should prevent lazy evaluation of the list generator
# otherwise the explicit conversion to the list should be performed here (within the lock)
res += (subtaskInfo(subtask) if not recursive or isinstance(subtask, Job)
else subtask.uncompleted(recursive) for subtask in self._items)
finally:
self._lock.release()
return res
class Job(object):
"""Job is executed in a separate process via Popen or Process object and is
managed by the Process Pool Executor
"""
# Note: the same job can be executed as Popen or Process object, but ExecPool
# should use some wrapper in the latter case to manage it
_RTM = 0.85 # Memory retention ratio, used to not drop the memory info fast on temporal releases, E [0, 1)
assert 0 <= _RTM < 1, 'Memory retention ratio should E [0, 1)'
# NOTE: keyword-only arguments are specified after the *, supported only since Python 3
def __init__(self, name, workdir=None, args=(), timeout=0, rsrtonto=False, task=None #,*
, startdelay=0., onstart=None, ondone=None, onfinish=None, params=None, category=None, size=0, slowdown=1.
, omitafn=False, memkind=1, memlim=0., stdout=sys.stdout, stderr=sys.stderr, poutlog=None, perrlog=None):
"""Initialize job to be executed
Main parameters:
name: str - job name
workdir - working directory for the corresponding process, None means the dir of the benchmarking
args - execution arguments including the executable itself for the process
NOTE: can be None to make make a stub process and execute the callbacks
timeout - execution timeout in seconds. Default: 0, means infinity
rsrtonto - restart the job on timeout, Default: False. Can be used for
non-deterministic Jobs like generation of the synthetic networks to regenerate
the network on border cases overcoming getting stuck on specific values of the rand variables.
task: Task - origin task if this job is a part of the task
startdelay - delay after the job process starting to execute it for some time,
executed in the CONTEXT OF THE CALLER (main process).
ATTENTION: should be small (0.1 .. 1 sec)
onstart - a callback, which is executed on the job starting (before the execution
started) in the CONTEXT OF THE CALLER (main process) with the single argument,
the job. Default: None.
If onstart() raises an exception then the job is completed before been started (.proc = None)
returning the error code (can be 0) and tracing the cause to the stderr.
ATTENTION: must be lightweight
NOTE:
- It can be executed several times if the job is restarted on timeout
- Most of the runtime job attributes are not defined yet
ondone - a callback, which is executed on successful completion of the job in the
CONTEXT OF THE CALLER (main process) with the single argument, the job. Default: None
ATTENTION: must be lightweight
onfinish - a callback, which is executed on either completion or termination of the job in the
CONTEXT OF THE CALLER (main process) with the single argument, the job. Default: None
ATTENTION: must be lightweight
params - additional parameters to be used in callbacks
stdout - None, stdout, stderr, file name or PIPE for the buffered output to be APPENDED.
The path is interpreted in the CALLER CONTEXT
stderr - None, stdout, stderr, file name or PIPE for the unbuffered error output to be APPENDED
ATTENTION: PIPE is a buffer in RAM, so do not use it if the output data is huge or unlimited.
The path is interpreted in the CALLER CONTEXT
poutlog: str - file name to log non-empty piped stdout pre-pended with the timestamp. Actual only if stdout is PIPE.
perrlog: str - file name to log non-empty piped stderr pre-pended with the timestamp. Actual only if stderr is PIPE.
Scheduling parameters:
omitafn - omit affinity policy of the scheduler, which is actual when the affinity is enabled
and the process has multiple treads
category - classification category, typically semantic context or part of the name,
used to identify related jobs;
requires _CHAINED_CONSTRAINTS
size - expected relative memory complexity of the jobs of the same category,
typically it is size of the processing data, >= 0, 0 means undefined size
and prevents jobs chaining on constraints violation;
used on _LIMIT_WORKERS_RAM or _CHAINED_CONSTRAINTS
slowdown - execution slowdown ratio, >= 0, where (0, 1) - speedup, > 1 - slowdown; 1 by default;
used for the accurate timeout estimation of the jobs having the same .category and .size.
requires _CHAINED_CONSTRAINTS
memkind - kind of memory to be evaluated (average of virtual and resident memory
to not overestimate the instant potential consumption of RAM):
0 - mem for the process itself omitting the spawned sub-processes (if any)
1 - mem for the heaviest process of the process tree spawned by the original process
(including the origin itself)
2 - mem for the whole spawned process tree including the origin process
memlim: float - max amount of memory in GB allowed for the job execution, 0 - unlimited
Execution parameters, initialized automatically on execution:
tstart - start time, filled automatically on the execution start (before onstart). Default: None
tstop - termination / completion time after ondone
NOTE: onstart() and ondone() callbacks execution is included in the job execution time
proc - process of the job, can be used in the ondone() to read its PIPE
pipedout - contains output from the PIPE supplied to stdout if any, None otherwise
NOTE: pipedout is used to avoid a deadlock waiting on the process completion having a piped stdout
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait
pipederr - contains output from the PIPE supplied to stderr if any, None otherwise
NOTE: pipederr is used to avoid a deadlock waiting on the process completion having a piped stderr
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait
mem - consuming memory (smooth max of average of VMS and RSS, not just the current value)
or the least expected value inherited from the jobs of the same category having non-smaller size;
requires _LIMIT_WORKERS_RAM
terminates - accumulated number of the received termination requests caused by the constraints violation
NOTE: > 0 (1 .. ExecPool._KILLDELAY) for the apps terminated by the execution pool
(resource constrains violation or ExecPool exception),
== 0 for the crashed apps
wkslim - worker processes limit (max number) on the job postponing if any,
the job is postponed until at most this number of worker processes operate;
requires _LIMIT_WORKERS_RAM
chtermtime - chained termination: None - disabled, False - by memory, True - by time;
requires _CHAINED_CONSTRAINTS
"""
assert isinstance(name, str) and timeout >= 0 and (task is None or isinstance(task, Task)
) and size >= 0 and slowdown > 0 and memkind in (0, 1, 2) and memlim >= 0 and (
poutlog is None or isinstance(poutlog, str)) and (perrlog is None or isinstance(perrlog, str)
), ('Job arguments are invalid, name: {}, timeout: {}, task type: {}, size: {}'
', slowdown: {}, memkind: {}, memlim: {}, poutlog: {}, perrlog: {}'.format(
name, timeout, type(task).__name__, size, slowdown, memkind, memlim, poutlog, perrlog))
#if not args:
# args = ("false") # Create an empty process to schedule its execution
# Properties specified by the input parameters -------------------------
self.name = name
self.workdir = workdir
self.args = args
self.params = params
self.timeout = timeout
self.rsrtonto = rsrtonto
self.task = task
# Delay in the callers context after starting the job process. Should be small.
self.startdelay = startdelay # 0.2 # Required to sync sequence of started processes
# Callbacks ------------------------------------------------------------
self.onstart = None if not callable(onstart) else types.MethodType(onstart, self)
self.ondone = None if not callable(ondone) else types.MethodType(ondone, self)
self.onfinish = None if not callable(onfinish) else types.MethodType(onfinish, self)
# I/O redirection ------------------------------------------------------
self.stdout = stdout
self.stderr = stderr
self.poutlog = poutlog
self.perrlog = perrlog
# Internal properties --------------------------------------------------
self.tstart = None # start time is filled automatically on the execution start, before onstart. Default: None
self.tstop = None # Termination / completion time after ondone
# Internal attributes
self.proc = None # Process of the job, can be used in the ondone() to read its PIPE
self.pipedout = None
self.pipederr = None
self.terminates = 0 # Accumulated number of the received termination requests caused by the constraints violation
# Process-related unified logging descriptors of file / system output channel / PIPE related system object
self._stdout = None
self._stderr = None
# Omit scheduler affinity policy (actual when some process is computed on all treads, etc.)
self._omitafn = omitafn
# Whether the job is restarting (in process) on timeout or because of the
# GROUP memory limit violation (where the job itself does not violate any constraints);
# required to be aware whether to complete the owner task
self._restarting = False
if _LIMIT_WORKERS_RAM:
# Note: wkslim is used only internally for the cross-category ordering
# of the jobs queue by reducing resource consumption