-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async DNS resolution via udns, take 3 #134
base: master
Are you sure you want to change the base?
Conversation
When udns is not available it should not fail, it should compile as normal. |
I implemented this with How would we convey to package maintainers that they should build in an environment with udns? |
configure.ac
Outdated
@@ -88,6 +88,19 @@ AC_ARG_ENABLE(openssl, | |||
] | |||
) | |||
|
|||
AC_ARG_ENABLE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this into a file in "scripts/".
src/torrent/connection_manager.h
Outdated
@@ -190,6 +205,17 @@ class LIBTORRENT_EXPORT ConnectionManager { | |||
slot_filter_type m_slot_filter; | |||
slot_resolver_type m_slot_resolver; | |||
slot_throttle_type m_slot_address_throttle; | |||
|
|||
#ifdef USE_UDNS | |||
UdnsEvent m_udnsevent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use USE_UDNS in torrent/ header files as that is defined in libtorrent's "config.h".
Instead declare a 'struct dns_resolver' in the the header, add a "std::unique_ptr<dns_resolver> m_dns_resolver". The actual struct definition can then be done in the source file.
src/torrent/connection_manager.cc
Outdated
MockResolve *mock_resolve = new MockResolve; | ||
mock_resolve->hostname = std::string(name); | ||
mock_resolve->family = family; | ||
mock_resolve->callback = cbck; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use C++11 style initializer, 'new MockResolver { ... }'.
src/torrent/connection_manager.cc
Outdated
void | ||
ConnectionManager::cancel_async_resolve(void *query) { | ||
#ifdef USE_UDNS | ||
m_udnsevent.cancel((UdnsQuery *) query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use reinterpret_cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People seem to recommend static_cast
for this? http://stackoverflow.com/a/573345
src/torrent/connection_manager.cc
Outdated
auto it = std::find(std::begin(m_mock_resolve_queue), std::end(m_mock_resolve_queue), mock_resolve); | ||
if (it != std::end(m_mock_resolve_queue)) { | ||
m_mock_resolve_queue.erase(it); | ||
delete mock_resolve; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to create the mock_resolve variable when you got the iterator.
src/torrent/connection_manager.h
Outdated
#include "torrent/utils/udnsevent.h" | ||
#ifndef USE_UDNS | ||
#include <list> | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just include the list, no need for the ifndef as that is almost guaranteed to already get included.
src/torrent/utils/udnsevent.h
Outdated
// and the int holds the error code. | ||
typedef std::function<void (const sockaddr*, int)> resolver_callback; | ||
|
||
#ifdef USE_UDNS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason why this needs to be in the torrent include files, and it also causes issues as it depends heavily on config.h.
If package maintainers want, or don't want, udns is up to them. That there is a check during configure that tests for udns by default is as far as libtorrent should go to force people to (not need to) make an active choice. |
In my experience, without this patch and without the mitigation of a local caching DNS resolver, rtorrent is nearly unusable over a high-latency link (because of UI stalls) --- so I think it wouldn't hurt to recommend building with udns, or at least to document that building with udns is possible. Did I address all the review feedback? |
It is basically a matter of different priorities. While there are users who have issues with dns resolution blocking, there are also a lot of people compiling the client on weird setups where udns is not available, and even more where it is not installed by default. Thus I do not want to make non-critical dependencies cause issues with compilation for other users. So if udns is installed, compile with it, if not then output a message during configure and compile normally. |
I agree completely; I don't want to make it harder for end users to compile the project. What I'm driving at is that I don't think most people are compiling the project themselves --- instead, they're getting it from the package manager of a distribution that also ships udns. So this is an opportunity to solve the problem in a centralized way that doesn't burden the individual user: if we document that it's recommended for package maintainers to build the project with udns support, then the maintainers can take care of ensuring that |
I understand where you are coming from, and appreciate it. However it is a decision that ultimately rests on me (unless there's a successful fork), and my decision remains in the favor of what I mentioned above. It is, after all, my inbox that is and gets flooded. Focus instead on creating a good branch I can merge. Which, btw, means fixing the incompatible indentation and such in e.g. 'udnsevent.cc', and moving that code to 'src/utils' as it is not a shared API. I'll follow that up with a more extensive review later. |
Fix indentation; move udnsevent implementation to src/utils
I noticed that there is an issue with how these APIs (both the old synchronous API, and the new async one) handle single-stack settings. Suppose for example that the DNS server supports both A and AAAA records, but the client only has IPv4 connectivity. Then It seems like in order to fix this fully, we'd need a way to offer multiple addresses to callbacks like |
I'm currently working on the 'feature-bind' branch that will ideally in the future be able to deal with those issues. |
Anyway, I look forward to a more detailed review :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished current review. Place don't close this pull request, instead merge your working branch into 'udns.10'.
src/utils/udnsevent.h
Outdated
#include <list> | ||
#include <inttypes.h> | ||
|
||
#include <udns.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't include 'udns.h' here, instead isolate it in the source file and declare it with 'struct dns_ctx;' in the header.
Same goes for UdnsQuery, which I'd prefer be named 'udns_query' as that is the current (new'ish) naming convention I'm using.
src/utils/udnsevent.h
Outdated
|
||
dns_ctx* m_ctx; | ||
rak::priority_item m_taskTimeout; | ||
std::list<UdnsQuery *> m_malformed_queries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use 'std::unique_ptr<udns_query>' instead of a C ptr. Vectors are almost always faster than lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below 'public:' add typedef's for 'query_list_type'.
src/utils/udnsevent.h
Outdated
|
||
// this wraps udns's dns_submit_a[46] functions. they and it return control immediately, | ||
// without either sending outgoing UDP packets or executing callbacks: | ||
struct UdnsQuery* enqueue_resolve(const char *name, int family, resolver_callback *callback); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to use 'struct' here in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the comment, don't start with 'this'. Rewrite it to something like "Wrap udns functions in ... . It returns ..."
src/utils/udnsevent.cc
Outdated
|
||
namespace torrent { | ||
|
||
int udnserror_to_gaierror(int udnserror) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This project doesn't do indentation for namespaces.
src/utils/udnsevent.cc
Outdated
} | ||
} | ||
|
||
/** Compatibility layers so udns can call std::function callbacks. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use C++-style comments.
src/utils/udnsevent.cc
Outdated
m_fileDesc = fd; | ||
} else { | ||
throw internal_error("dns_init failed"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorder this so that you only go into an if block in the exceptional case:
m_fileDesc = ::dns_init...;
if (fd == -1)
throw ...
Also looking at the documentation it looks like if you call dns_init(m_ctx, 1) you don't need to call it above. Or if you do called it above, just use dns_open.
src/utils/udnsevent.cc
Outdated
} else { | ||
// no pending queries | ||
manager->poll()->remove_read(this); | ||
manager->poll()->remove_error(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorder this so you have:
if (timeout == -1) {
...
return;
}
...
src/utils/udnsevent.cc
Outdated
void UdnsEvent::cancel(struct UdnsQuery *query) { | ||
if (query == NULL) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use brackets when the if block is a single line.
} | ||
#endif | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit too much to include in this file, consider moving it to a source file in the same place as the udns stuff is.
This should address most of the issues. The second commit should be more readable since all the de-indentation noise is factored out. There are two things I didn't do:
|
@slingamn just wanted you to know that I've already pulled your branch and I'm working on it. I'll merge it soon. |
@rakshasa any news on the merge? |
Still working on the rtorrent-docker test environment, it's probably going to be a couple weeks still before it is ready. |
Just posting a quick merge/resolution with master, in case anyone else wants to use this patchset. |
Any news on the merge? |
There's no meaningful support for dual stack right now, so supporting AAAA records just breaks the functionality. See these comments: * rakshasa#134 (comment) * rakshasa#134 (comment)
+1 for waiting |
I verified that this applies against |
m_readBuffer(NULL), | ||
m_writeBuffer(NULL) { | ||
|
||
m_taskTimeout.slot() = std::bind(&TrackerUdp::receive_timeout, this); | ||
|
||
m_resolver_callback = std::bind(&TrackerUdp::start_announce, this, std::placeholders::_1, std::placeholders::_2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not stable. We can't have the callback in the TrackerUdp
object. If the torrent is removed before the announce completes, it will trigger a crash. The callback will no longer be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, this line hasn't changed in years. You've been deploying this and it's been running stably, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if I understand your concern correctly, it's addressed by having close_directly
remove the callback:
libtorrent/src/tracker/tracker_udp.cc
Lines 180 to 182 in a03859e
TrackerUdp::close_directly() { | |
manager->connection_manager()->async_resolver().cancel(m_resolver_query); | |
m_resolver_query = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, we passing are m_resolver_callback
as a pointer to enqueue. This must remain valid indefinitely because it's async. It was stable on my test environment. But once we tested it with 10k of clients, it started flooding crash dumps.
m_resolver_query = manager->connection_manager()->async_resolver().enqueue(
hostname.data(),
AF_INET,
&m_resolver_callback
);
This is the solution I implemented to avoid sending a pointer that could potentially become undefined. It successfully resolved the crash I was able to reproduce. stickz/rtorrent@32e1e6f...5517bef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The callbacks are "asynchronous" in the sense that the main event loop does not block while the query is in flight, but the callbacks can only be triggered from ::dns_ioevent
or ::dns_timeouts
, which are called from within the main event loop (once the poll layer detects that a response is ready, or that the query has failed). Accordingly a call to ::dns_cancel
from the main event loop can prevent the callbacks from triggering. Because the main loop runs within a single thread, it should not be possible for these operations to race against each other.
According to my understanding, the specific scenario you described (DNS query is initiated, the torrent is removed, DNS response is received, the callback executes against a freed TrackerUdp
object) cannot occur for this exact reason: TrackerUdp::~TrackerUdp
calls UdnsAsyncResolver::cancel
, which calls ::dns_cancel
on the callbacks pointing to the TrackerUdp
object that is being freed.
I respect the intensive testing you have done on this patchset but I have run it continuously for years on heavy workloads without issue. I have also tested it with Valgrind, which found memory safety issues in the DHT layer (e.g. #166) but not with the udns code. I think it's possible that the issues you observed are a bad interaction with another patchset you're using? I would also be interested to see any backtraces or stack dumps you were able to collect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a stack trace I was able to collect and resolve for this problem.
Thread 1 (Thread 0x7f04816ad680 (LWP 114901)):
#0 0x00007f04820e350b in pthread_kill () from /usr/lib64/libc.so.6
#1 0x00007f0482086906 in raise () from /usr/lib64/libc.so.6
#2 0x00007f04820698b7 in abort () from /usr/lib64/libc.so.6
#3 0x000055de99b537c8 in do_panic (signum=11) at /usr/src/debug/net-p2p/rtorrent-0.9.8.5.4/rtorrent-5.4-0.9.8-0.13.8/rtorrent/src/main.cc:645
#4 <signal handler called>
#5 0x000055de9dd38e40 in ?? ()
#6 0x00007f0482613280 in std::function<void (sockaddr const*, int)>::operator()(sockaddr const*, int) const (__args#1=<optimized out>, __args#0=<optimized out>, this=<optimized out>, this=<optimized out>, __args#0=<optimized out>, __args#1=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/std_function.h:591
#7 torrent::a4_callback_wrapper (result=<optimized out>, data=0x7f04541fc480, ctx=<optimized out>) at utils/udnsevent.cc:60
#8 torrent::a4_callback_wrapper (ctx=<optimized out>, result=<optimized out>, data=0x7f04541fc480) at utils/udnsevent.cc:40
#9 0x00007f0481addf1c in dns_ioevent () from /usr/lib64/libudns.so.0
#10 0x00007f04825a9e9e in torrent::PollEPoll::perform (this=0x55de9ac6b630) at torrent/poll_epoll.cc:185
#11 0x00007f04825d41b4 in torrent::thread_base::event_loop (thread=0x55de9ac72300) at torrent/utils/thread_base.cc:174
#12 0x000055de99b7aa5c in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/net-p2p/rtorrent-0.9.8.5.4/rtorrent-5.4-0.9.8-0.13.8/rtorrent/src/main.cc:508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting back to me on this.
I believe this is a bug in your fork. You originally merged the udns changes in stickz/rtorrent@62d3f5e ; this version of the patch correctly includes the call to AsyncResolver::cancel
in TrackerUdp::close_directly
that I linked above. Then subsequently in stickz/rtorrent@a843234 you replaced tracker_udp.cc
with a version (from libtorrent's master?) that does not include this call. Consequently, the callback does not get cancelled and can fire even when it has become invalid, which is what we see in the stack trace.
@rakshasa what would it take to get this merged? Is The current behavior in this branch is to do IPv4 resolution only. |
There's too many easier things to fix atm, and I'd like to do a more comprehensive review of the current tracker code first. This includes fixing some ipv4/6 issues I'm seeing in rtorrent-docker. |
This is #133, but with actual autoconf changes, and a green Travis build (with the two additional configurations you suggested).
Right now, the build requires udns by default and fails if it's not found; to build without udns, you need
--disable-udns
.