Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async DNS resolution via udns, take 3 #134

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

slingamn
Copy link
Contributor

This is #133, but with actual autoconf changes, and a green Travis build (with the two additional configurations you suggested).

Right now, the build requires udns by default and fails if it's not found; to build without udns, you need --disable-udns.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 17.891% when pulling c004a60 on slingamn:udns.10 into f466d36 on rakshasa:master.

@rakshasa
Copy link
Owner

When udns is not available it should not fail, it should compile as normal.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 17.891% when pulling 803907e on slingamn:udns.10 into f466d36 on rakshasa:master.

@slingamn
Copy link
Contributor Author

I implemented this with AC_ARG_WITH and AC_CHECK_HEADERS.

How would we convey to package maintainers that they should build in an environment with udns?

configure.ac Outdated
@@ -88,6 +88,19 @@ AC_ARG_ENABLE(openssl,
]
)

AC_ARG_ENABLE(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into a file in "scripts/".

@@ -190,6 +205,17 @@ class LIBTORRENT_EXPORT ConnectionManager {
slot_filter_type m_slot_filter;
slot_resolver_type m_slot_resolver;
slot_throttle_type m_slot_address_throttle;

#ifdef USE_UDNS
UdnsEvent m_udnsevent;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use USE_UDNS in torrent/ header files as that is defined in libtorrent's "config.h".

Instead declare a 'struct dns_resolver' in the the header, add a "std::unique_ptr<dns_resolver> m_dns_resolver". The actual struct definition can then be done in the source file.

MockResolve *mock_resolve = new MockResolve;
mock_resolve->hostname = std::string(name);
mock_resolve->family = family;
mock_resolve->callback = cbck;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use C++11 style initializer, 'new MockResolver { ... }'.

void
ConnectionManager::cancel_async_resolve(void *query) {
#ifdef USE_UDNS
m_udnsevent.cancel((UdnsQuery *) query);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use reinterpret_cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People seem to recommend static_cast for this? http://stackoverflow.com/a/573345

auto it = std::find(std::begin(m_mock_resolve_queue), std::end(m_mock_resolve_queue), mock_resolve);
if (it != std::end(m_mock_resolve_queue)) {
m_mock_resolve_queue.erase(it);
delete mock_resolve;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to create the mock_resolve variable when you got the iterator.

#include "torrent/utils/udnsevent.h"
#ifndef USE_UDNS
#include <list>
#endif
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just include the list, no need for the ifndef as that is almost guaranteed to already get included.

// and the int holds the error code.
typedef std::function<void (const sockaddr*, int)> resolver_callback;

#ifdef USE_UDNS
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason why this needs to be in the torrent include files, and it also causes issues as it depends heavily on config.h.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 17.866% when pulling 1a8c250 on slingamn:udns.10 into f466d36 on rakshasa:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 17.892% when pulling 719dd39 on slingamn:udns.10 into f466d36 on rakshasa:master.

@rakshasa
Copy link
Owner

If package maintainers want, or don't want, udns is up to them. That there is a check during configure that tests for udns by default is as far as libtorrent should go to force people to (not need to) make an active choice.

@slingamn
Copy link
Contributor Author

In my experience, without this patch and without the mitigation of a local caching DNS resolver, rtorrent is nearly unusable over a high-latency link (because of UI stalls) --- so I think it wouldn't hurt to recommend building with udns, or at least to document that building with udns is possible.

Did I address all the review feedback?

@rakshasa
Copy link
Owner

It is basically a matter of different priorities.

While there are users who have issues with dns resolution blocking, there are also a lot of people compiling the client on weird setups where udns is not available, and even more where it is not installed by default.

Thus I do not want to make non-critical dependencies cause issues with compilation for other users. So if udns is installed, compile with it, if not then output a message during configure and compile normally.

@slingamn
Copy link
Contributor Author

I agree completely; I don't want to make it harder for end users to compile the project.

What I'm driving at is that I don't think most people are compiling the project themselves --- instead, they're getting it from the package manager of a distribution that also ships udns. So this is an opportunity to solve the problem in a centralized way that doesn't burden the individual user: if we document that it's recommended for package maintainers to build the project with udns support, then the maintainers can take care of ensuring that udns.h is available in the build environment and libudns.so is marked as a dependency of the libtorrent package. Then in these distributions, the package manager will take care of everything during install/upgrade, without burdening the user.

@rakshasa
Copy link
Owner

I understand where you are coming from, and appreciate it.

However it is a decision that ultimately rests on me (unless there's a successful fork), and my decision remains in the favor of what I mentioned above. It is, after all, my inbox that is and gets flooded.

Focus instead on creating a good branch I can merge.

Which, btw, means fixing the incompatible indentation and such in e.g. 'udnsevent.cc', and moving that code to 'src/utils' as it is not a shared API. I'll follow that up with a more extensive review later.

Fix indentation; move udnsevent implementation to src/utils
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 17.888% when pulling 5ca51d3 on slingamn:udns.10 into f466d36 on rakshasa:master.

@slingamn
Copy link
Contributor Author

I noticed that there is an issue with how these APIs (both the old synchronous API, and the new async one) handle single-stack settings. Suppose for example that the DNS server supports both A and AAAA records, but the client only has IPv4 connectivity. Then getaddrinfo(3) with AF_UNSPEC will return both IPv4 and IPv6 addresses, but torrent::resolve_host will arbitrarily choose the first one in line and pass it to the callback. Similarly, UdnsEvent will request A and AAAA records in parallel, but will call the callback with whichever is returned first. In each case, if the callback happens to be invoked with an IPv6 address, it will not succeed.

It seems like in order to fix this fully, we'd need a way to offer multiple addresses to callbacks like UdpTracker::start_announce, or else modify the state machine so that it first attempts to announce over IPv4 and then fails over to IPv6.

@rakshasa
Copy link
Owner

I'm currently working on the 'feature-bind' branch that will ideally in the future be able to deal with those issues.

@slingamn
Copy link
Contributor Author

Anyway, I look forward to a more detailed review :-)

Copy link
Owner

@rakshasa rakshasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished current review. Place don't close this pull request, instead merge your working branch into 'udns.10'.

#include <list>
#include <inttypes.h>

#include <udns.h>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't include 'udns.h' here, instead isolate it in the source file and declare it with 'struct dns_ctx;' in the header.

Same goes for UdnsQuery, which I'd prefer be named 'udns_query' as that is the current (new'ish) naming convention I'm using.


dns_ctx* m_ctx;
rak::priority_item m_taskTimeout;
std::list<UdnsQuery *> m_malformed_queries;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use 'std::unique_ptr<udns_query>' instead of a C ptr. Vectors are almost always faster than lists.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below 'public:' add typedef's for 'query_list_type'.


// this wraps udns's dns_submit_a[46] functions. they and it return control immediately,
// without either sending outgoing UDP packets or executing callbacks:
struct UdnsQuery* enqueue_resolve(const char *name, int family, resolver_callback *callback);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to use 'struct' here in C++.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the comment, don't start with 'this'. Rewrite it to something like "Wrap udns functions in ... . It returns ..."


namespace torrent {

int udnserror_to_gaierror(int udnserror) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This project doesn't do indentation for namespaces.

}
}

/** Compatibility layers so udns can call std::function callbacks. */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use C++-style comments.

m_fileDesc = fd;
} else {
throw internal_error("dns_init failed");
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorder this so that you only go into an if block in the exceptional case:

m_fileDesc = ::dns_init...;

if (fd == -1)
  throw ...

Also looking at the documentation it looks like if you call dns_init(m_ctx, 1) you don't need to call it above. Or if you do called it above, just use dns_open.

src/utils/udnsevent.cc Show resolved Hide resolved
} else {
// no pending queries
manager->poll()->remove_read(this);
manager->poll()->remove_error(this);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorder this so you have:

if (timeout == -1) {
  ...
  return;
}

...

void UdnsEvent::cancel(struct UdnsQuery *query) {
if (query == NULL) {
return;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use brackets when the if block is a single line.

}
#endif
}
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit too much to include in this file, consider moving it to a source file in the same place as the udns stuff is.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 17.898% when pulling 3e20053 on slingamn:udns.10 into f466d36 on rakshasa:master.

@slingamn
Copy link
Contributor Author

This should address most of the issues. The second commit should be more readable since all the de-indentation noise is factored out.

There are two things I didn't do:

  1. Moving AsyncResolver out of connection_manager.cc. It's just a proxy class that holds the implementation of some ConnectionManager methods (e.g., ConnectionManager::enqueue_async_resolve), so I thought moving it out would sacrifice locality.
  2. Changing m_malformed_queries to be a vector<unique_ptr>. I thought this would confuse the object lifecycle of udns_query, because on some paths it has to be managed in a C-style way (in particular, when it does get passed to udns).

@rakshasa
Copy link
Owner

@slingamn just wanted you to know that I've already pulled your branch and I'm working on it. I'll merge it soon.

@Kcchouette
Copy link

@rakshasa any news on the merge?
Thanks for your work :)

@rakshasa
Copy link
Owner

@rakshasa any news on the merge?

Still working on the rtorrent-docker test environment, it's probably going to be a couple weeks still before it is ready.

@rakshasa rakshasa closed this Nov 10, 2019
@rakshasa rakshasa reopened this Nov 10, 2019
@slingamn
Copy link
Contributor Author

Just posting a quick merge/resolution with master, in case anyone else wants to use this patchset.

@ghost
Copy link

ghost commented Apr 28, 2022

Any news on the merge?

@slingamn
Copy link
Contributor Author

cc @stickz re. 2510069

@arbitrary-dev
Copy link

+1 for waiting

@slingamn
Copy link
Contributor Author

I verified that this applies against v0.14.0 and runs stably, although it is unstable against a subsequent change (see #256 for details).

m_readBuffer(NULL),
m_writeBuffer(NULL) {

m_taskTimeout.slot() = std::bind(&TrackerUdp::receive_timeout, this);

m_resolver_callback = std::bind(&TrackerUdp::start_announce, this, std::placeholders::_1, std::placeholders::_2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not stable. We can't have the callback in the TrackerUdp object. If the torrent is removed before the announce completes, it will trigger a crash. The callback will no longer be valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, this line hasn't changed in years. You've been deploying this and it's been running stably, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if I understand your concern correctly, it's addressed by having close_directly remove the callback:

TrackerUdp::close_directly() {
manager->connection_manager()->async_resolver().cancel(m_resolver_query);
m_resolver_query = NULL;

Copy link
Contributor

@stickz stickz Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we passing are m_resolver_callback as a pointer to enqueue. This must remain valid indefinitely because it's async. It was stable on my test environment. But once we tested it with 10k of clients, it started flooding crash dumps.

  m_resolver_query = manager->connection_manager()->async_resolver().enqueue(
      hostname.data(),
      AF_INET,
      &m_resolver_callback
  );

This is the solution I implemented to avoid sending a pointer that could potentially become undefined. It successfully resolved the crash I was able to reproduce. stickz/rtorrent@32e1e6f...5517bef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callbacks are "asynchronous" in the sense that the main event loop does not block while the query is in flight, but the callbacks can only be triggered from ::dns_ioevent or ::dns_timeouts, which are called from within the main event loop (once the poll layer detects that a response is ready, or that the query has failed). Accordingly a call to ::dns_cancel from the main event loop can prevent the callbacks from triggering. Because the main loop runs within a single thread, it should not be possible for these operations to race against each other.

According to my understanding, the specific scenario you described (DNS query is initiated, the torrent is removed, DNS response is received, the callback executes against a freed TrackerUdp object) cannot occur for this exact reason: TrackerUdp::~TrackerUdp calls UdnsAsyncResolver::cancel, which calls ::dns_cancel on the callbacks pointing to the TrackerUdp object that is being freed.

I respect the intensive testing you have done on this patchset but I have run it continuously for years on heavy workloads without issue. I have also tested it with Valgrind, which found memory safety issues in the DHT layer (e.g. #166) but not with the udns code. I think it's possible that the issues you observed are a bad interaction with another patchset you're using? I would also be interested to see any backtraces or stack dumps you were able to collect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a stack trace I was able to collect and resolve for this problem.

Thread 1 (Thread 0x7f04816ad680 (LWP 114901)):
#0  0x00007f04820e350b in pthread_kill () from /usr/lib64/libc.so.6
#1  0x00007f0482086906 in raise () from /usr/lib64/libc.so.6
#2  0x00007f04820698b7 in abort () from /usr/lib64/libc.so.6
#3  0x000055de99b537c8 in do_panic (signum=11) at /usr/src/debug/net-p2p/rtorrent-0.9.8.5.4/rtorrent-5.4-0.9.8-0.13.8/rtorrent/src/main.cc:645
#4  <signal handler called>
#5  0x000055de9dd38e40 in ?? ()
#6  0x00007f0482613280 in std::function<void (sockaddr const*, int)>::operator()(sockaddr const*, int) const (__args#1=<optimized out>, __args#0=<optimized out>, this=<optimized out>, this=<optimized out>, __args#0=<optimized out>, __args#1=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/std_function.h:591
#7  torrent::a4_callback_wrapper (result=<optimized out>, data=0x7f04541fc480, ctx=<optimized out>) at utils/udnsevent.cc:60
#8  torrent::a4_callback_wrapper (ctx=<optimized out>, result=<optimized out>, data=0x7f04541fc480) at utils/udnsevent.cc:40
#9  0x00007f0481addf1c in dns_ioevent () from /usr/lib64/libudns.so.0
#10 0x00007f04825a9e9e in torrent::PollEPoll::perform (this=0x55de9ac6b630) at torrent/poll_epoll.cc:185
#11 0x00007f04825d41b4 in torrent::thread_base::event_loop (thread=0x55de9ac72300) at torrent/utils/thread_base.cc:174
#12 0x000055de99b7aa5c in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/net-p2p/rtorrent-0.9.8.5.4/rtorrent-5.4-0.9.8-0.13.8/rtorrent/src/main.cc:508

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting back to me on this.

I believe this is a bug in your fork. You originally merged the udns changes in stickz/rtorrent@62d3f5e ; this version of the patch correctly includes the call to AsyncResolver::cancel in TrackerUdp::close_directly that I linked above. Then subsequently in stickz/rtorrent@a843234 you replaced tracker_udp.cc with a version (from libtorrent's master?) that does not include this call. Consequently, the callback does not get cancelled and can fire even when it has become invalid, which is what we see in the stack trace.

@slingamn slingamn closed this Nov 11, 2024
@slingamn slingamn reopened this Nov 11, 2024
@slingamn
Copy link
Contributor Author

slingamn commented Dec 3, 2024

@rakshasa what would it take to get this merged? Is feature-bind still in progress?

The current behavior in this branch is to do IPv4 resolution only.

@rakshasa
Copy link
Owner

rakshasa commented Dec 4, 2024

There's too many easier things to fix atm, and I'd like to do a more comprehensive review of the current tracker code first.

This includes fixing some ipv4/6 issues I'm seeing in rtorrent-docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants