-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGES
165 lines (122 loc) · 7 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
Changes from previous Cobalt Versions
Changes to 0.99.0pre36
* Fixed bug in resource reservation that would leave a reserved partition in
the 'idle' state
* Updated the code that sets the kernel profile for a partition to set the
profile even if no profile is currently set
Changes to 0.99.0pre35
* Fixed bug where pickling IncrID could fail when deleting db related
attributes from the object's dictionary
Changes to 0.99.0pre34
* Time spent in the hold state is now exported to the utility function
* bgsystem now tracks forked tasks using a (forker, id) tuple instead of just
id in order to prevent id collisions in the active jobs list
* Updated cleanup and resource reservation in bgsystem updated to prevent
multiple partition cleanups for a single job
* Forker classes updated so that logging messages properly identify the
component from which they came
* Added LOGNAME and SHELL to the environment of script jobs
* Resolved a bug in the cdbwriter-api where a job object which had no data
records could endless recurse in _jr_obj.__getattr__ looking for valid fields
* Implemented support for common job ID pool through database ID generation
* PYTHONPATH may now be explicitly set when building the wrapper program
* Fix for cobalt-mpirun where the -env flag was being improperly handled
* Updated Cobalt's database documentation diagrams
Changes to 0.99.0pre31
* Cleaned up reservation logging and database logging
* Reservations can now be tagged by adding a -A flag to setres
* A view of reservations with resid, cycleid and project can be obtained via
the -x flag to showres.
Changes to 0.99.0pre30
* Improved logging of reservations to the cobalt database.
- "ending" event changed to "terminated" indicates that the reservaiton
has been ended and cleaned up.
- added "deferred" event for when reservations are deferred (setres -D)
- added "deactivating" event for when reservation come to a natural end
(i.e. run out of time)
- added "releasing" event for a user-requested release of a reservation
- deferrals of reservations now no longer cause new reservation_data
table entries to be generated. The subsequent cycle does that.
* cluster_system has had a variety of enhancements. Notably:
- setting simulation_mode true in the cobalt config file will allow the
cluster_system component to enter a test mode, where it can be run
on a development machine without a supporting cluster.
- nodes are now recognized as being allocated when a location is selected
and sent back to the scheduler component as a runnable location.
A timeout is set for the partition, and by default this is 300 seconds
(5 minutes). This can be changed reset at execution time from the
cobalt config file. Should resources be allocated, but the job be
terminated prior to a run (prescript failure due to a dead node, or
a hasty user kill, for instance), this will ensure that the nodes are
released.
* Jobs submitted to cluster_system will now have arguments passed to the job
recognized, as they are on BlueGene type systems.
Changes to 0.99.0pre29
* Arguments that get passed to scripts (not script jobs but pre and
postscripts) are now being properly escaped.
* Cobaltlogs are now being written to by cqm in a separate thread, and
should allow scheduling to continue in the event of a filesystem hang.
* qalter has had a bug fixed where it was passing back nodecounts and proccounts
as a string rather than an int.
* jobs in cqm that do not have a timer are appropriately destroyed rather
than entering into an infinite loop of attempting to run the job.
* Fix for the situation where a resid can be duplicated in the case of a
cycling reservation
Changes to 0.99.0pre28
* Modification to the XMLRPC base proxy class that allows you to turn off
automatic retries from the cobalt side on the proxy by requesting no
retries. This should be used for non-idempotent tasks like job-launching.
* Modified cqm's retry behavior to include a runid when running a job, so and
retrying so that bgforker doesn't try to start the same job twice.
* Timezone display corrected. If available, cobalt will try to use the pytz
library (Olsen Timezone Database).
Changes to 0.99.0pre27
* Restored COBALT_JOBID and COBALT_RESID in back-end jobs. This was erroniously
removed in 0.99.0pre26.
Changes to 0.99.0pre26
* Fixed bug where environmental variables passed into a job from the --env flag
of qsub and qalter would leak into the environment of bgforker and from
there into subsequent mpirun process environments. These did not leak
into the backend-job environments. Cluster-system component jobs did not
see issues with this due to an apparent bug in envrionment handling.
* A job's exit status is now added to the Cobalt Database as exit_status
* COBALT_RESID is set in mpirun jobs
Changes to 0.99.0pre25
* Various timezone formatting changes including:
- setres takes the reservation time depending on what TZ is set to
- qstat and showres show times in a unified format, that includes
offset and three-letter timezone designation
- showres has a flag --oldts where it shows timestamps in the original
format for compatibility with old scripts
* Forker was not handling spaces in arguments to scripts correctly
* schedctl no longer gives an erronious error message about requiring a job id
for --start, --stop
* Statemachine state-transition function pointers are now regenerated at
restart as well as job initialization, to allow for transparent state-
machine adjustments
Changes to 0.99.0pre24
* Fixes for arguments to pre and post scripts for jobs
* showres reverted to not include id's
* Fixes for script-mode jobs
changes to 0.99.0pre23
* Script jobs run with their own shells from forker to prevent 32/64 bit execution environment issues
changes to 0.99.0pre22
* Bgforker modified so that helper scripts can be run
* KNOWN ISSUE: job preemption is not working correctly in this version.
* Resid now added to jobs at runtime, if they are run in a reservation.
* Fix for partially overlapping partitions being scheduled at the same time
and properly detecting reservations.
changes to 0.99.0pre21
* Fix to reduce chance of cluster system nodes from being hung-up on job exit.
* Adding in backfill prediction functionality. This is by default disabled.
* Various fixes and updates to the mk_* cobalt backup scripts. These can be
used to restore state should a change that is destructive to statefiles is
made (like the 0.99.0pre16 to 0.99.0pre17 change).
* resid and cycleid can be set via setres
* time.sleep() wrapped to prevent kernel level IOError from creeping out on
ppc64 linux platforms.
changes to 0.99.0pre20
* stderr log messages now contain timestamps
* ensemble jobs that use -nofree without a -wait free are now cleaned up properly
* environment variables in cert, key and ca options in config files are expanded
changes to 0.99.0pre19