-
Notifications
You must be signed in to change notification settings - Fork 16
Installation and configuration
Python 3.6 or higher. sqlite3 3.7.0 or higher if the database backend is sqlite3.
Harvester can be installed without root privilege.
# setup virtual environment (N.B. if your system supports conda instead of virtualenv see "How to use conda" in the Misc section)
$ virtualenv harvester # or python -m venv harvester (for python 3)
$ cd harvester
$ . bin/activate
# install additional python packages if missing
$ pip install pip --upgrade
$ pip install --upgrade setuptools>=39.0.1
# install Harvester
$ pip install git+https://github.com/HSF/harvester.git
# for ATLAS GRID instance, install with this instead
# (deprecated)
$ pip install git+https://github.com/HSF/harvester#egg=pandaharvester[atlasgrid]
# new syntax
$ pip install pandaharvester[atlasgrid]@git+https://github.com/HSF/harvester
# copy sample setup and config files
$ mv etc/sysconfig/panda_harvester.rpmnew.template etc/sysconfig/panda_harvester
$ mv etc/panda/panda_common.cfg.rpmnew etc/panda/panda_common.cfg
$ mv etc/panda/panda_harvester.cfg.rpmnew.template etc/panda/panda_harvester.cfg
# virtual environment
$ cd harvester
$ . bin/activate
# Update Harvester code
$ pip install --no-deps --force-reinstall git+https://github.com/HSF/harvester.git
# And then update dependent pip packages for ATLAS GRID instance
# (deprecated)
$ pip install --upgrade git+https://github.com/HSF/harvester.git#egg=pandaharvester[atlasgrid]
# new syntax
$ pip install --upgrade pandaharvester[atlasgrid]@git+https://github.com/HSF/harvester
Several parameters need to be adjusted in the setup file (etc/sysconfig/panda_harvester) and two config files (etc/panda/panda_common.cfg and etc/panda/panda_harvester.cfg). panda_harvester.cfg can be put remotely (see remote configuration files).
The following parameters need to be modified in etc/sysconfig/panda_harvester.
Name | Description |
---|---|
PANDA_HOME | Config files must be under $PANDA_HOME/etc |
PYTHONPATH | Must contain the pandacommon package and site-packages where the pandaharvester package is available |
- Example
export PANDA_HOME=$VIRTUAL_ENV
export PYTHONPATH=$VIRTUAL_ENV/lib/python3.6/site-packages/pandacommon:$VIRTUAL_ENV/lib/python3.6/site-packages
The logdir needs to be set in etc/panda/panda_common.cfg. It is recommended to use a non-NFS directory to avoid buffering. Here are additional explanations for logging parameters.
Name | Description |
---|---|
logdir | A directory for log files |
- Example
logdir = /var/log/panda
The following list shows parameters need to be adjusted in etc/panda/panda_harvester.cfg. You can use
Name | Description |
---|---|
master.uname | User name of the daemon process |
master.gname | Group name of the daemon process |
master.harvester_id | Unique ID of the harvester instance. See link for the details |
db.database_filename | Filename of the local database. Note that sqlite doesn't like NAS |
db.engine | database engine : sqlite or mariadb |
db.verbose | Set True to dump all SQL queries in the log file |
pandacon.ca_cert | CERN CA certificate file |
pandacon.cert_file | A grid proxy file to access the panda server |
pandacon.key_file | The same as pandacon.cert_file |
pandacon.auth_token | Token for oidc (put this line only if authenticating PanDA server with oidc token to instead of grid proxy) |
pandacon.auth_origin | Origin for oidc (put this line only if authenticating PanDA server with oidc token to instead of grid proxy) |
qconf.configFile | The queue configuration file. See the next section for details |
qconf.queueList | The list of PandaQueues for which the harvester instance works |
credmanager.moduleName | The module name of the credential manager |
credmanager.className | The class name of the credential manager |
credmanager.inCertFile | A grid proxy without VOMS extension. CredManager plugin generates VOMS proxy using the file |
credmanager.outCertFile | A grid proxy with VOMS extension which is generated by CredManager plugin |
Concerning agent optimization, see the next section.
Most agents define lockInterval
and xyzInterval
(where 'xyz' is 'check', 'trigger', and so on, depending on agent actions) in panda_harvester.cfg. Each agent runs multiple threads in parallel and each thread processes job and/or worker objects independently. First each thread retrieves objects from the database, processes them, and finally releases them. lockInterval
defines how long the objects are kept for a thread after they are retrieved. During the period other threads cannot touch the objects. Another thread can take those objects after lockInterval
, which is useful when harvester is restarted after it was killed and the objects were not properly released. Note that lockInterval
must be longer than the process time of each thread. Otherwise, multiple threads would try to process the same objects concurrently.
On the other hand, xyzInterval
defines how often the objects are processed by threads, i.e. once the objects are released by a thread, they are processed again after the interval of xyzInterval
. maxJobsXYZ defines how many job objects are retrieved by a thread. Generally large maxJobsXYZ doesn't make sense since jobs are sequentially processed by the thread and the process time of the thread simply becomes longer. Also large maxJobsXYZ could be problematic in terms of memory usage since many job objects are loaded into RAM from the database before being processed.
Plug-ins for each PandaQueue is configured in the queue configuration file. The filename is defined in qconf.configFile. It has to be put in the $PANDA_HOME/etc/panda directory and/or at URL (see remote configuration files). This file might be integrated in the information system json in the future, but for now it has to be manually created. Here are examples of the queue configuration file for the grid and for HPC. The contents is a json dump of
{
"PandaQueueName1": {
"QueueAttributeName1": ValueQ_1,
"QueueAttributeName2": ValueQ_2,
...
"QueueAttributeNameN": ValueQ_N,
"Agent1": {
"AgentAttribute1": ValueA_1,
"AgentAttribute2": ValueA_2,
...
"AgentAttributeM": ValueA_M
},
"Agent2": {
...
},
...
"AgentX": {
...
},
},
"PandaQueueName2": {
...
},
...
"PandaQueueNameY": {
...
},
}
Here is the list of queue attributes.
Name | Description |
---|---|
prodSourceLabel | Source label of the queue. managed for production |
prodSourceLabelRandomWeightsPermille | The probability distribution (in permille) to randomize the source label of the jobs that job_fetcher fetches. E.g. "prodSourceLabelRandomWeightsPermille": {"rc_test":150, "rc_test2":200, "rc_alrb":250} makes job_fetcher to fetch rc_test jobs in 15% probability, rc_test2 in 20%, rc_alrb in 25%, and jobs of prodSourceLabel (defined above) in the rest 40% |
nQueueLimitJob | The max number of jobs pre-fetched and queued, i.e. jobs in starting state. This attribute is ignored if nQueueLimitJobRatio is used. See this page for the details |
nQueueLimitJobRatio | The target ration of the number of starting jobs to the number of running jobs. See this page for the details |
nQueueLimitJobMax | Supplemental attribute for nQueueLimitJobRatio to define the upper limit on the number of starting jobs. See this page for the details |
nQueueLimitJobMin | Supplemental attribute for nQueueLimitJobRatio to define the lower limit on the number of starting jobs. See this page for the details |
nQueueLimitWorker | The max number of workers queued in the batch system, i.e. workers in submitted, pending, or idle state |
maxWorkers | The max number of workers. maxWorkers-nQueueLimitWorker is the number of running workers |
nQueueLimitWorkerRatio | The limit on the ratio of queued workers to running workers. (the number of queued workers)/(the number of running workers) must be less than nQueueLimitWorkerRatio/100 |
nQueueLimitWorkerMax | The max number of queued workers |
nQueueLimitWorkerMin | The min number of queued workers |
maxNewWorkersPerCycle | The max number of workers which can be submitted in a single submission cycle. 0 by default to be unlimited |
truePilot | To suppress heartbeats for jobs in running, transferring, finished, failed state |
runMode | self (by default) to submit workers based on nQueueLimit* and maxWorkers. slave to be centrally controlled by panda |
allowJobMixture | Jobs from different tasks can be given to a single worker if true |
mapType | Mapping between jobs and workers. NoJob = (workers themselves get jobs directly from Panda after they are submitted). OneToOne = (1 job x 1 worker). OneToMany = (1xN, aka the multiple consumer mode). ManyToOne = (Nx1, aka the multi-job pilot mode). Harvester prefetches jobs except NoJob. |
useJobLateBinding | true if the queue uses job-level late-binding. Note that for job-level late-binding harvester prefetches jobs to pass them to workers when those workers get CPUs, so mapType must not be NoJob. If this flag is false or omitted jobs are submitted together with workers. |
Agent is preparator, submitter, workMaker, messenger,
stager, monitor, and sweeper. Two agent parameters name
and module
are mandatory to define the class name module names of the agent.
Roguly speaking,
from agentModle import agentName
agent = agentName()
is internally invoked. Other agent attributes are set to the agent instance as instance variables. Parameters for plugins are described in this page.
An example of init.d script is available at etc/rc.d/init.d/panda_harvester.rpmnew.template.
You need change
VIRTUAL_ENV
in the script and rename it to panda_harvester-apachectl. Change log and lock files if necessary.
Then to start/stop harvester
$ etc/rc.d/init.d/panda_harvester start
$ etc/rc.d/init.d/panda_harvester stop
- For NERSC
$ module load python
$ module load virtualenv
- For others
$ pip install virtualenv --user
or more details in https://virtualenv.pypa.io/en/stable/installation/
$ module load python
$ cd harvester
$ . bin/activate
$ pip install --index-url=http://pypi.python.org/simple/ --trusted-host pypi.python.org python-daemon
$ cd harvester
$ . bin/activate
$ pip install rucio-clients
$ cat etc/rucio.cfg.atlas.client.template | grep -v ca_cert > etc/rucio.cfg
$ echo "ca_cert = /etc/pki/tls/certs/CERN-bundle.pem" >> etc/rucio.cfg
$ echo "auth_type = x509_proxy" >> etc/rucio.cfg
$
$ # For tests
$ export X509_USER_PROXY=...
$ export RUCIO_ACCOUNT=...
$ rucio ping
$ cd panda-harvester
$ rm -rf dist; pip install . --upgrade --force-reinstall --no-deps; pip install . --upgrade
If your system supports conda instead of virtualenv, setup conda before using pip.
- For Cori@NERSC
$ module load python
$ mkdir harvester
$ conda create -p ~/harvester python
$ source activate ~/harvester
It is possible to automatically restart harvester when it died by using supervisord which can be installed via pip.
$ pip install supervisor
An example of supervisord configuration file is available at etc/panda/panda_supervisord.cfg.
You need to rename it to panda_supervisord.cfg and change logfile, pidfile, and command parameters accordingly.
The command parameter uses the init.d script. PROGNAME
in the init.d script needs to be changed to
PROGNAME='python -u '${SITE_PACKAGES_PATH}'/pandaharvester/harvesterbody/master.py --foreground'
because applications to be run under supervisord must be executed in the foreground, i.e., not be daemonized. To start supervisord
$ supervisord -c etc/panda/panda_supervisord.cfg
then harvester is automatically started. To stop/start harvester,
$ supervisorctl stop panda-harvester
$ supervisorctl start panda-harvester
To stop supervisord
$ supervisorctl shutdown
Harvester is automatically stopped when supervisord is stopped.
It is possible to configure harvester instances with more powerful database backend (MariaDB) and multi-processing based on Apache+WSGI (or uWSGI). Note that Apache is used to launch multiple harvester processes, so you don't have to use apache messengers for communication between harvester and workers unless that is needed.
First you need to make the HARVESTER database and the harvester account on MariaDB. E.g.
$ mysql -u root
MariaDB > CREATE DATABASE HARVESTER;
MariaDB > CREATE USER 'harvester'@'localhost' IDENTIFIED BY 'password';
MariaDB > GRANT ALL PRIVILEGES ON HARVESTER.* TO 'harvester'@'localhost';
Note that harvester tables are automatically made when the harvester instance gets started, so you don't have make them by yourself. Make sure that you don't have STRICT_TRANS_TABLES.
MariaDB [(none)]> SELECT REPLACE(@@SQL_MODE, ',', '\n');
+--------------------------------+
| REPLACE(@@SQL_MODE, ',', '\n') |
+--------------------------------+
| |
+--------------------------------+
1 row in set (0.01 sec)
Then edit /etc/my.cnf if need to optimize the database by yourself, e.g.,
[mysqld]
max_allowed_packet=1024M
Harvester uses mysql-connector by default to access to MariaDB.
$ pip install mysql-connector-python<=8.0.11
(Warning: Is was tested that mysql-connector-python 8.0.12 does not work)
The following changes are required in panda_harvester.cfg:
[db]
# engine sqlite or mariadb
engine = mariadb
# user name
user = harvester
# password
password = FIXME
# schema
schema = HARVESTER
where engine
should be set to mariadb and password
should be changed accordingly.
If you want to use mysqlclient (whose python module is called MySQLdb) to access to MariaDB instead,
$ pip install mysqlclient
(Note: Since mysqlclient requires compilation from MySQL lib, one may need to install additional package in advance: yum install mysql-devel
or yum install MariaDB-devel MariaDB-shared
)
In addition, you need to enable useMySQLdb
under [db]
in panda_harvester.cfg :
useMySQLdb = True
First, make sure that httpd and mod_wsgi are installed on your node.
An example of the httpd config file is available at etc/panda/panda_harvester-httpd.conf.rpmnew.template
which needs to be renamed to panda_harvester-httpd.conf before being edited. User
and Group
need to be modified at least. In the httpd.conf there is a string like
WSGIDaemonProcess pandahvst_daemon processes=2 threads=2 home=${VIRTUAL_ENV}
which defines the number of processes and the number of threads in each process. Those numbers may be increased if necessary.
The following changes are required in panda_harvester.cfg:
[frontend]
# type
type = apache
where type
should be set to apache. Note that the port number for apache is defined in
panda_harvester-httpd.conf.
Use panda_harvester-apachectl to start or stop harvester. An example of apachectl is available at
etc/rc.d/init.d/panda_harvester-apachectl.rpmnew.template. You need change
VIRTUAL_ENV
in the script and rename it to panda_harvester-apachectl. Then
$ etc/rc.d/init.d/panda_harvester-apachectl start
$ etc/rc.d/init.d/panda_harvester-apachectl stop
$ curl http://localhost:26080/entry -H "Content-Type: application/json" -d '{"methodName":"test", "workerID":123, "data":"none"}'
it will receive a message like 'workerID=123 not found in DB'.
Another option for multi-processing is uWSGI.
Install uwsgi in the same python environment of harvester:
$ pip install uwsgi
A template of the service script is available at etc/rc.d/init.d/panda_harvester-uwsgi.rpmnew.template for easy start. Copy the template to new file named etc/rc.d/init.d/panda_harvester-uwsgi. In the CONFIGURATION SECTION, userName
, groupName
, VIRTUAL_ENV
, LOG_DIR
need to be modified at least. Other variables can be modified as well, say nProcesses
and nThreads
defines the number of processes and the number of threads in each process.
Also, there is option to run uWSGI with an independent configuration file for more configuration flexibility: One can uncomment the line of uwsgiConfig
In the CONFIGURATION SECTION and set it to be the path of the uWSGI ini configuration file (filename must end in extension ".ini"). A template of uWSGI ini configuration file is available at etc/panda/panda_harvester-uwsgi.ini.rpmnew.template -- one can copy it to etc/panda/panda_harvester-uwsgi.ini (it should be functional before any modification).
Then, one can use this script to start, stop, or reload harvester:
$ etc/rc.d/init.d/panda_harvester-uwsgi start
$ etc/rc.d/init.d/panda_harvester-uwsgi stop
$ etc/rc.d/init.d/panda_harvester-uwsgi reload
where reload can be used after harvester code or configurations (e.g. harvester.cfg) change.
(Recommended for el9 or above)
As of v0.3.2, after pip installed harvester, a new configuration template about environment variables is available at etc/sysconfig/panda_harvester_env.systemd.rpmnew (some fields should already be automatically filled during installation). Copy the file etc/sysconfig/panda_harvester_env and edit it if necessary.
A template of the systemd script is available at etc/systemd/system/panda_harvester-uwsgi.service. Copy the template to a new file named /etc/systemd/system/panda_harvester-uwsgi.service , and run systemd daemon reload:
# systemctl daemon-reload
And then one can start, stop, restart, or reload (keep the uWSGI master process and restart harvester sub-process) harvester:
# systemctl start panda_harvester-uwsgi.service
# systemctl stop panda_harvester-uwsgi.service
# systemctl restart panda_harvester-uwsgi.service
# systemctl reload panda_harvester-uwsgi.service
Apache messenger can also work when harvester running with uWSGI. Once can either let uWSGI spawn an http router process, or setup a frontend web/proxy/router service which can speak in uwsgi protocol (e.g. NGiNX, Apache).
First, the following changes are required in panda_harvester.cfg:
[frontend]
# type
type = apache
where type
should be set to apache. uWSGI will load apache messenger application after harvester restart. (Note that the port number here is ineffective in this case.)
Next, if one wants the http router by uWSGI itself, the address setup of httpRouter
is required in etc/rc.d/init.d/panda_harvester-uwsgi . For example:
httpRouter="127.0.0.1:25080"
This opens port 25080 on localhost.
httpRouter=":25080"
This opens port 25080 to everywhere.
Then, stop and start harvester again with this script, and it's done. (Note that using this script to reload does not work here since its own uwsgi configuration changed.)
On the other hand, if one wants http service opened on additional service, in etc/rc.d/init.d/panda_harvester-uwsgi the httpRouter
must not be set. Instead, just configure one's frontend service to proxy or route to the socket uWSGI is running. For example, in etc/rc.d/init.d/panda_harvester-uwsgi say there is
uwsgiSocket="127.0.0.1:3334"
where uWSGI running with localhost:3334 open. Say if one has already set up the nginx service and wants a reverse proxy for harvester apache messenger, then just add the following directives in the nginx config
uwsgi_pass 127.0.0.1:3334;
include *path_of_uwsgi_params*;
A complete nginx config may look like
server {
listen 8000;
server_name localhost;
charset utf-8;
access_log /var/log/nginx/app.net_access.log;
error_log /var/log/nginx/app.net_error.log;
location /harvester {
uwsgi_pass 127.0.0.1:3334;
include /opt/app/extras/uwsgi_params;
}
}
Then reload nginx service, and it's done.
Apache messenger testing approach: same as https://github.com/HSF/harvester/wiki/Installation-and-configuration/_edit#to-test-apache-messenger
It is possible to load system and/or queue configuration files via http/https. This is typically useful to have a centralized pool of configuration files, so that it is easy to see with which configuration each harvester instance is running. There are two environment variables HARVESTER_INSTANCE_CONFIG_URL and HARVESTER_QUEUE_CONFIG_URL to define URLs for system config and queue config files, respectively. If those variable are set, the harvester instance loads config files from those URLs and then overwrites parameters if they are specified in local config files. Sensitive information like database password should be stored only in local config files. System config files are read only when the harvester instance is launched, while queue config files are read every 10 min so that queue configuration can be dynamically changed during the instance is running. Note that remote queue config file is periodically cached in the database by Cacher which automatically gets started when the harvester instance is launched, so you don't have to do anything manually. However, when you edit remote queue config file and then want to run some unit tests which don't run Cacher, you have to manually cache it using cacherTest.py.
$ python lib/python*/site-packages/pandaharvester/harvestertest/cacherTest.py
Getting started |
---|
Installation and configuration |
Testing and running |
Debugging |
Work with Middleware |
Admin FAQ |
Development guides |
---|
Development workflow |
Tagging |
Production & commissioning |
---|
Scale up submission |
Condor experiences |
Commissioning on the grid |
Production servers |
Service monitoring |
Auto Queue Configuration with CRIC |
SSH+RPC middleware setup |
Kubernetes section |
---|
Kubernetes setup |
X509 credentials |
AWS setup |
GKE setup |
CERN setup |
CVMFS installation |
Generic service accounts |
Advanced payloads |
---|
Horovod integration |