The DNS resolution of the overall UDP part has been delayed to the connection initiation stage. During the rule matching process, it will only be triggered when the IP rule without no-resolve is matched.
For direct and wireguard outbound, the same logic as the TCP part will be followed, that is, when direct-nameserver (or DNS configured by wireguard) exists, the result of the matching process will be discarded and the domain name will be re-resolved. This re-resolution logic is only effective for fakeip.
For reject and DNS outbound, no resolution is required.
For other outbound, resolution will still be performed when the connection is initiated, and the domain name will not be sent directly to the remote server at present.
All UDP packets are queued into a single channel, and multiple
workers are launched to poll the channel in current design.
This introduces a problem where UDP packets from a single connection
are delivered to different workers, thus forwarded in a random order
if workers are on different CPU cores. Though UDP peers normally
have their own logic to handle out-of-order packets, this behavior will
inevitably cause significant variance in delay and harm connection quality.
Furthermore, this out-of-order behavior is noticeable even if the underlying
transport could provide guaranteed orderly delivery - this is unacceptable.
This commit takes the idea of RSS in terms of NICs: it creates a distinct
queue for each worker, hashes incoming packets, and distribute the packet
to a worker by hash result. The tuple (SrcIP, SrcPort, DstIP, DstPort, Proto)
is used for hashing (Proto is always UDP so it's dropped from final
implementation), thus packets from the same connection can be sent to
the same worker, keeping the receiving order. Different connections can be
hashed to different workers to maintain performance.
Performance for single UDP connection is not affected, as there is already
a lock in natTable that prevents multiple packets being processed in different
workers, limiting single connection forwarding performance to 1 worker.
The only performance penalty is the hashing code, which should be neglectable
given the footprint of en/decryption work.
Co-authored-by: Hamster Tian <haotia@gmail.com>
* feat: host support domain and multiple ips
* chore: append local address via `clash`
* chore: update hosts demo
* chore: unified parse mixed string and array
* fix: flatten cname
* chore: adjust logic
* chore: reuse code
* chore: use cname in tunnel
* chore: try use domain mapping when normal dns
* chore: format code