-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vhost: Migration of vhost_user device using ovs_dpdk as backend fails using Cloud Hypervisor on Ubuntu 22.04 host #217
Comments
The backtrace seems to indicate that the frontend waits for an ACK from the backend. Is the SET_LOG_MESSAGE acked by the backend here? The reproduction steps are fairly involved... Do you have a link to the code of the backend that you pair this with? |
@Ablu Thanks for the reply.
https://github.com/rust-vmm/vhost/blob/main/vhost/src/vhost_user/connection.rs#L538 do you mean this part or something else?
There is another simple method to reproduce this scenario is
This test will run inside the container. I don't know adding gdb there. If you would like to run it on host all those steps are involved. |
I am trying to debug it further. It fails/blocks at self.sock.recv_with_fds(iovs, &mut fd_array) : vhost/vhost/src/vhost_user/connection.rs Lines 328 to 329 in 0669474
I went further and it fails/blocks at socket
I couldn't figure out from |
@rveerama1: Yeah, as I said before, it looks like we expect an ACK from the backend does not arrive. It would be interesting to know if the SET_LOG_MESSAGE is acked by the backend here. |
do you mean reply to this call? vhost/vhost/src/vhost_user/frontend.rs Line 215 in 0669474
|
logs from
|
Yes, that sends a vhost/vhost/src/vhost_user/frontend.rs Line 232 in 0669474
You will probably want to double check that our behavior is correct according to the vhost-user spec, then try to find out if your backend does not actually send the ACK or whether it gets lost somewhere. |
https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#communication VHOST_USER_SET_LOG_BASE is in the list of those that require it, but only when VHOST_USER_PROTOCOL_F_LOG_SHMFD is negotiated. Looking at our frontend (and also the one in QEMU) it seems to require the ack in that condition. Can you check that the backend does the same? (i.e. it sends the ack when F_LOG_SHMFD is negotiated) |
@Ablu @stefano-garzarella thanks for the replies. I will check and get back to you. |
We have noticed an issue in vhost while we migrating VMs using Cloud Hypervisor. We are attempting to migrate Docker container from Ubuntu 20.04 to Ubuntu 22.04 and noticed this issue. Details about our attempts are here cloud-hypervisor/cloud-hypervisor#5877 .
This particular test from https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/tests/integration.rs#L9616 live_migration::live_migration_sequential::test_live_migration_ovs_dpdk test has been stuck while running on Ubuntu 22.04 host.
On further debugging it has been struck while sending request SET_LOG_BASE to the backend at https://github.com/rust-vmm/vhost/blob/main/vhost/src/vhost_user/connection.rs#L538. It never returns from here (neither error or success).
Attached backtrace for further details.
Steps to reproduce this steps.
Clone Cloud Hypervisor and build
Build custom kernel from here : https://github.com/cloud-hypervisor/cloud-hypervisor/tree/main#custom-kernel-and-disk-image
Get the Guest image from here : https://cloud-hypervisor.azureedge.net/focal-server-cloudimg-amd64-custom-20210609-0.qcow2
and follow the steps : https://github.com/cloud-hypervisor/cloud-hypervisor/tree/main#disk-image
Once you have custom kernel and guest image, run below command from different terminals
The text was updated successfully, but these errors were encountered: