Skip to content

Testing and running

FaHui Lin edited this page Feb 8, 2024 · 7 revisions

Tests

First setup environment variables.

$ source etc/sysconfig/panda_harvester

All log files are available in the logdir which is defined in etc/panda/panda_common.cfg. The filename is panda-ClassName.log where ClassName is a plug-in or agent class name.

Unit tests

  • Testing the local database and connection to PanDA
$ python lib/python*/site-packages/pandaharvester/harvestertest/basicTest.py
  • Testing submission and monitoring with the batch system
$ python lib/python*/site-packages/pandaharvester/harvestertest/submitterTest.py [PandaQueueName]
  • Testing stage-in
python lib/python*/site-packages/pandaharvester/harvestertest/stageInTest.py [PandaQueueName]
  • Testing stage-out
python lib/python*/site-packages/pandaharvester/harvestertest/stageOutTest.py [PandaQueueName]

Functional tests

Harvester runs multiple threads in parallesl so that debugging is rather complicated. However, functions can be gradually executed by using

$ python lib/python*/site-packages/pandaharvester/harvesterbody/master.py --pid $PWD/tmp.pid --single

Run

Manually start and stop daemon

$ python lib/python*/site-packages/pandaharvester/harvesterbody/master.py --pid $PWD/tmp.pid
$ kill -USR2 `cat $PWD/tmp.pid`

If one has init.d script

$ etc/rc.d/init.d/panda_harvester start
$ etc/rc.d/init.d/panda_harvester stop

If one has systemd service set up, one can start, stop, restart, and reload harvester service:

# systemctl start panda_harvester-uwsgi.service
# systemctl stop panda_harvester-uwsgi.service
# systemctl restart panda_harvester-uwsgi.service
# systemctl reload panda_harvester-uwsgi.service

Troubleshooting

Sometimes the test script will get stuck, and one needs to terminate the script ctrl+C by hand:

For example:

(harvester) [root@aipanda083 ~]# python /opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestertest/stageInTest.py ANALY_TAIWAN_TEST
^CTraceback (most recent call last):
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestertest/stageInTest.py", line 10, in <module>
    queueConfig = queueConfigMapper.get_queue(queueName)
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/queue_config_mapper.py", line 624, in get_queue
    self.load_data()
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/queue_config_mapper.py", line 308, in load_data
    resolver = self._get_resolver()
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/queue_config_mapper.py", line 253, in _get_resolver
    resolver = pluginFactory.get_plugin(pluginConf)
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/plugin_factory.py", line 49, in get_plugin
    impl = cls(**args)
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestermisc/info_utils.py", line 20, in __init__
    panda_queues_cache = dbInterface.get_cache(cacher_key)
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/db_interface.py", line 20, in get_cache
    return self.dbProxy.get_cache(data_name)
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/db_proxy_pool.py", line 31, in __call__
    return func(*args, **kwargs)
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/db_proxy.py", line 3081, in get_cache
    globalDict.acquire()
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/core_utils.py", line 105, in acquire
    self.lock.acquire()
KeyboardInterrupt

So the script got stuck when trying to acquire a lock and get blocked, which makes it hard to debug the plugin.

It is actually possible that there are some exception happened in-between.

To obtain what happened clearly, one can turn add timeout to the lock to end blockage:

(harvester) [root@aipanda083 ~]# vim /opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestercore/core_utils.py

E.g. Add timeout of 3 seconds:

    def acquire(self):
        self.lock.acquire(timeout=3)

Then test the script again to see the real problem. For example:

(harvester) [root@aipanda083 ~]# python /opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestertest/stageInTest.py ANALY_TAIWAN_TEST
plugin=PilotmoverMTPreparator
testing stagein:
BasePath from preparator configuration: /tmp/harv_pil_test 
basePath redifuned for test data: /tmp/harv_pil_test/testdata/ 
Traceback (most recent call last):
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvestertest/stageInTest.py", line 36, in <module>
    tmpStat, tmpOut = preparatorCore.trigger_preparation(jobSpec)
  File "/opt/harvester/lib/python3.6/site-packages/pandaharvester/harvesterpreparator/pilotmover_mt_preparator_kari.py", line 138, in trigger_preparation
    for i in range(0, len(files), n_files_per_thread):
TypeError: 'float' object cannot be interpreted as an integer

After lock timed out, it dumps the exception from the plugin.

Clone this wiki locally