Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stopping maltcp_ctx is unreliable #3

Open
gbonnefille opened this issue Jul 7, 2016 · 5 comments
Open

Stopping maltcp_ctx is unreliable #3

gbonnefille opened this issue Jul 7, 2016 · 5 comments

Comments

@gbonnefille
Copy link
Contributor

Reading the MALC API, it seems that the operation to interrupt the mal_ctx_start() is mal_ctx_stop(). But, this call gives an unpredictable behaviour: the unit tests provided by malc stop cleanly, while our implementation (using two separate processes, one for the provider and another for the consumer) have a behaviour depending on the number of CPU.

Reading the code, we understand that mal_ctx_stop() destroy the zloop, but do not stop it. Reading czmq zloop's test (Cf. https://github.com/zeromq/czmq/blob/master/src/zloop.c#L876) the right way to stop the zloop is to have an event handler returning '-1'.

If our understanding is correct, we think that malc should be improved in the following way:

  • rename mal_ctx_stop() in mal_ctx_destroy()
  • add a new inproc socket (ctrl_socket?) in the maltcp_ctx
  • register a trivial event handler on this socket to return '-1'
  • create a new mal_ctx_stop() sending a dedicated message ($TERM?) in the ctrl_socket

Do you agree this understanding?

@freyssin
Copy link
Collaborator

freyssin commented Jul 8, 2016

I agree with your analysis in the way to stop correctly a zloop.

However, as all transports are not based on a zloop, the code that sends the dedicated message must be located in the _ctx_stop method (maltcp and malzmq) rather than in mal_ctx_stop.

@gbonnefille
Copy link
Contributor Author

However, as all transports are not based on a zloop, the code that sends the dedicated message must be located in the _ctx_stop method (maltcp and malzmq) rather than in mal_ctx_stop.

Of course, I would just mean that, as a user we use mal_* fonctions (higher API), even if the change should be made on some implementations only.

@freyssin
Copy link
Collaborator

Normally the context closing has been improved and the zloop is stopped correctly.

@gbonnefille
Copy link
Contributor Author

gbonnefille commented Jan 3, 2017

I still encounter SEGFAULT at test termination, at random.

For example, malzmq_pubsub_app concludes in:

Stopped.
destroyed.
Tests passed OK
E: 17-01-03 11:20:10 dangling 'PAIR' socket created at src/zsys.c:398
E: 17-01-03 11:20:10 dangling 'PAIR' socket created at src/zsys.c:399
E: 17-01-03 11:20:10 dangling 'PAIR' socket created at src/zsys.c:398
E: 17-01-03 11:20:10 dangling 'PAIR' socket created at src/zsys.c:399
E: 17-01-03 11:20:10 dangling 'PAIR' socket created at src/zsys.c:398
E: 17-01-03 11:20:10 dangling 'PAIR' socket created at src/zsys.c:399
E: 17-01-03 11:20:10 dangling sockets: cannot terminate ZMQ safely
Makefile:1607: recipe for target 'check-local' failed
make[2]: *** [check-local] Erreur de segmentation
make[2]: Leaving directory '/home/egsccdev/MOL/malc/test/malzmq_pubsub_app'
Makefile:1451: recipe for target 'check-am' failed

@georgeslabreche
Copy link

I believe this issue should be re-opened. Stopping maltcp_ctx on my end does something with the endpoints which prevents my application from receiving responses messages to my request operation after I start maltcp_ctx again. As a workaround I am completely destroying and recreating the consumer object for each request, which is a bit overkill. I would have liked to have re-used the same listening socket connection.

@freyssin freyssin reopened this Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants