-
Notifications
You must be signed in to change notification settings - Fork 16
Testing and running
First setup environment variables.
$ source etc/sysconfig/panda_harvester
All log files are available in the logdir which is defined in etc/panda/panda_common.cfg. The filename is panda-ClassName.log where ClassName is a plug-in or agent class name.
- Testing the local database and connection to PanDA
$ python lib/python*/site-packages/pandaharvester/harvestertest/basicTest.py
- Testing submission and monitoring with the batch system
$ python lib/python*/site-packages/pandaharvester/harvestertest/submitterTest.py [PandaQueueName]
- Testing stage-in
python lib/python*/site-packages/pandaharvester/harvestertest/stageInTest.py [PandaQueueName]
- Testing stage-out
python lib/python*/site-packages/pandaharvester/harvestertest/stageOutTest.py [PandaQueueName]
Harvester runs multiple threads in parallesl so that debugging is rather complicated. However, functions can be gradually executed by using
$ python lib/python*/site-packages/pandaharvester/harvesterbody/master.py --pid $PWD/tmp.pid --single
Manually start and stop daemon
$ python lib/python*/site-packages/pandaharvester/harvesterbody/master.py --pid $PWD/tmp.pid
$ kill -USR2 `cat $PWD/tmp.pid`
If one has init.d script
$ etc/rc.d/init.d/panda_harvester start
$ etc/rc.d/init.d/panda_harvester stop
If one has systemd service set up, one can start, stop, restart, and reload harvester service:
# systemctl start panda_harvester-uwsgi.service
# systemctl stop panda_harvester-uwsgi.service
# systemctl restart panda_harvester-uwsgi.service
# systemctl reload panda_harvester-uwsgi.service
Sometimes the test script will get stuck, and one needs to terminate the script ctrl+C by hand:
For example:
(harvester) [root@aipanda083 ~]# python /opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestertest/stageInTest.py ANALY_TAIWAN_TEST
^CTraceback (most recent call last):
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestertest/stageInTest.py", line 10, in <module>
queueConfig = queueConfigMapper.get_queue(queueName)
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/queue_config_mapper.py", line 624, in get_queue
self.load_data()
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/queue_config_mapper.py", line 308, in load_data
resolver = self._get_resolver()
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/queue_config_mapper.py", line 253, in _get_resolver
resolver = pluginFactory.get_plugin(pluginConf)
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/plugin_factory.py", line 49, in get_plugin
impl = cls(**args)
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestermisc/info_utils.py", line 20, in __init__
panda_queues_cache = dbInterface.get_cache(cacher_key)
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/db_interface.py", line 20, in get_cache
return self.dbProxy.get_cache(data_name)
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/db_proxy_pool.py", line 31, in __call__
return func(*args, **kwargs)
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/db_proxy.py", line 3081, in get_cache
globalDict.acquire()
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/core_utils.py", line 105, in acquire
self.lock.acquire()
KeyboardInterrupt
So the script got stuck when trying to acquire a lock and get blocked, which makes it hard to debug the plugin.
It is actually possible that there are some exception happened in-between.
To obtain what happened clearly, one can turn add timeout to the lock to end blockage:
(harvester) [root@aipanda083 ~]# vim /opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/core_utils.py
E.g. Add timeout of 3 seconds:
def acquire(self):
self.lock.acquire(timeout=3)
Then test the script again to see the real problem. For example:
(harvester) [root@aipanda083 ~]# python /opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestertest/stageInTest.py ANALY_TAIWAN_TEST
plugin=PilotmoverMTPreparator
testing stagein:
BasePath from preparator configuration: /tmp/harv_pil_test
basePath redifuned for test data: /tmp/harv_pil_test/testdata/
Traceback (most recent call last):
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestertest/stageInTest.py", line 36, in <module>
tmpStat, tmpOut = preparatorCore.trigger_preparation(jobSpec)
File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvesterpreparator/pilotmover_mt_preparator_kari.py", line 138, in trigger_preparation
for i in range(0, len(files), n_files_per_thread):
TypeError: 'float' object cannot be interpreted as an integer
After lock timed out, it dumps the exception from the plugin.
Getting started |
---|
Installation and configuration |
Testing and running |
Debugging |
Work with Middleware |
Admin FAQ |
Development guides |
---|
Development workflow |
Tagging |
Production & commissioning |
---|
Scale up submission |
Condor experiences |
Commissioning on the grid |
Production servers |
Service monitoring |
Auto Queue Configuration with CRIC |
SSH+RPC middleware setup |
Kubernetes section |
---|
Kubernetes setup |
X509 credentials |
AWS setup |
GKE setup |
CERN setup |
CVMFS installation |
Generic service accounts |
Advanced payloads |
---|
Horovod integration |