这篇随笔是对Wikipedia上k-d tree词条的摘录, 我认为解释得相当生动详细, 是一篇不可多得的好文.


Overview

A \(k\)-d tree (short for \(k\)-dimensional tree) is a binary space-partitioning tree for organizing points in a \(k\)-dimensional space. \(k\)-d trees are a useful data structure for searches involving a multidimensional search key.

Construction

The canonical method of \(k\)-d tree construction has the following constraints:

  • As one moves down the tree, one cycles through the axes used to select the splitting planes.
  • Points are inserted by selecting the median of the points being put into the subtree, with respect to their coordinates in the axis being used to create the splitting plane.

This method leads to a balanced \(k\)-d tree, in which each leaf node is approximately the same distance from the root. However, balanced trees are not necessarily optimal for all applications.

Nearest Neighboring Search

Terms:

  • the split dimensions
  • the splitting (hyper)plane
  • "current best"

The nearest neighbor (NN) search algorithm aims to find the point in the tree that is nearest to a given point. This search can be done efficiently by using the tree properties to quickly eliminate large portions of the search space.

Searching for a nearest neighbor in a \(k\)-d tree proceeds as follows:

  1. Starting with the root node, the algorithm moves down the tree recursively.
  2. Once the algorithm reaches a leaf node, it saves that node point as "current best"
  3. The algorithm unwinds the recursion of the tree, performing the following steps at each node:
  4. If the current node is closer than the current best, then it becomes the current best.
  5. The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane with a hypersphere around the the search point that has a radius equal to the current nearest distance. Since the hyperplanes are all axis-aligned this is implemented as a simple comparison to see whether the distance between the splitting coordinate of the search point and current node is less than the distance (overall coordinates) from the search point to the current best.
    1. If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search.
    2. If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.

Generally the algorithm uses squared distances for comparison to avoid computing square roots. Additionally, it can save computation by holding the squared current best distance in a variable for computation.

The algorithm can be extended in several ways by simple modifications. If can provide the $k $ nearest neighbors to a point by maintaining \(k\) current bests instead of just one. A branch is only eliminated when \(k\) points have been found and the branch cannot have points closer than any of the \(k\) current bests.


Implementation

\(k\)近临 (\(k\)NN)

#include <bits/stdc++.h>
#define lson id<<1
#define rson id<<1|1
#define sqr(x) (x)*(x)
using namespace std;
using LL=long long;
const int N=5e4+5;

// K-D tree: a special case of binary space partitioning trees
int DIM, idx;
struct Node{
    int key[5];
    bool operator<(const Node &rhs)const{
        return key[idx]<rhs.key[idx];
    }
    void read(){
        for(int i=0; i<DIM; i++)
            scanf("%d", key+i);
    }
    LL dis2(const Node &rhs)const{
        LL res=0;
        for(int i=0; i<DIM; i++)
            res+=sqr(key[i]-rhs.key[i]);
        return res;
    }
    void out(){
        for(int i=0; i<DIM; i++)
            printf("%d%c", key[i], i==DIM-1?'\n':' ');
    }
}p[N];

Node a[N<<2];   // K-D tree
bool f[N<<2];

// [l, r)
void build(int id, int l, int r, int dep){
    if(l==r) return;    // error-prone
    f[id]=true, f[lson]=f[rson]=false;
    // select axis based on depth so that axis cycles through all valid values
    idx=dep%DIM;
    int mid=l+r>>1;
    // sort point list and choose median as pivot element
    nth_element(p+l, p+mid, p+r);
    a[id]=p[mid];
    build(lson, l, mid, dep+1);
    build(rson, mid+1, r, dep+1);
}

using P=pair<LL,Node>;
priority_queue<P> que;
// multidimensional search key

void query(const Node &p, int id, int m, int dep){
    int dim=dep%DIM;
    int x=lson, y=rson;
    // left: <, right >=
    if(p.key[dim]>=a[id].key[dim])
        swap(x, y);

    if(f[x]) query(p, x, m, dep+1);

    P cur{p.dis2(a[id]), a[id]};

    if(que.size()<m){
        que.push(cur);
    }
    else if(cur.first<que.top().first){
        que.pop();
        que.push(cur);
    }
    if(f[y] && sqr(a[id].key[dim]-p.key[dim])<que.top().first)
        query(p, y, m, dep+1);
}

说明:

  1. bool数组f[], 表示一个完全二叉树中的某个节点是否存在, 也可不用完全二叉树的表示法, 而用两个数组lson[]rson[]表示, 这样的好处还有: 节省空间, 数组可以只开到节点数的2倍.
  2. 区间采用左闭右开表示.

K-D Tree的更多相关文章

  1. AOJ DSL_2_C Range Search (kD Tree)

    Range Search (kD Tree) The range search problem consists of a set of attributed records S to determi ...

  2. Size Balance Tree(SBT模板整理)

    /* * tree[x].left 表示以 x 为节点的左儿子 * tree[x].right 表示以 x 为节点的右儿子 * tree[x].size 表示以 x 为根的节点的个数(大小) */ s ...

  3. HDU3333 Turing Tree(线段树)

    题目 Source http://acm.hdu.edu.cn/showproblem.php?pid=3333 Description After inventing Turing Tree, 3x ...

  4. POJ 3321 Apple Tree(树状数组)

                                                              Apple Tree Time Limit: 2000MS   Memory Lim ...

  5. CF 161D Distance in Tree 树形DP

    一棵树,边长都是1,问这棵树有多少点对的距离刚好为k 令tree(i)表示以i为根的子树 dp[i][j][1]:在tree(i)中,经过节点i,长度为j,其中一个端点为i的路径的个数dp[i][j] ...

  6. Segment Tree 扫描线 分类: ACM TYPE 2014-08-29 13:08 89人阅读 评论(0) 收藏

    #include<iostream> #include<cstdio> #include<algorithm> #define Max 1005 using nam ...

  7. Bzoj 2588: Spoj 10628. Count on a tree 主席树,离散化,可持久,倍增LCA

    题目:http://www.lydsy.com/JudgeOnline/problem.php?id=2588 2588: Spoj 10628. Count on a tree Time Limit ...

  8. Size Balanced Tree(SBT) 模板

    首先是从二叉搜索树开始,一棵二叉搜索树的定义是: 1.这是一棵二叉树: 2.令x为二叉树中某个结点上表示的值,那么其左子树上所有结点的值都要不大于x,其右子树上所有结点的值都要不小于x. 由二叉搜索树 ...

  9. hdu 5274 Dylans loves tree(LCA + 线段树)

    Dylans loves tree Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 131072/131072 K (Java/Othe ...

随机推荐

  1. 信息安全-5:RSA算法详解(已编程实现)[原创]

    转发注明出处:http://www.cnblogs.com/0zcl/p/6120389.html 背景介绍 1976年以前,所有的加密方法都是同一种模式: (1)甲方选择某一种加密规则,对信息进行加 ...

  2. C++_系列自学课程_第_8_课_指针和引用_《C++ Primer 第四版》

    C语言最富有迷幻色彩的部分当属指针部分,无论是指针的定义还是指针的意义都可算是C语言中最复杂的内容.指针不但提供给了程序员直接操作硬件部分的操作接口,还提供给了程序员更多灵活的用法.C++继承这一高效 ...

  3. 解决Visual C++ Redistributable for Visual Studio 2015的安装问题

    1. Visual C++ Redistributable for Visual Studio 2015系统要求:Windows 7情况下必须是Windows 7 with SP1.或者Windows ...

  4. 消息中间件与JMS标准

    初识消息中间件 维基百科上对于消息中间件的定义是"Message-oriented middleware(MOM) is software infrastructure focused on ...

  5. Python3.1-标准库之Numpy

    这系列用来介绍Python的标准库的支持Numpy部分.资料来自http://wiki.scipy.org/Tentative_NumPy_Tutorial,页面有许多链接,这里是直接翻译,所以会无法 ...

  6. 新策略构思 dual thrust

    根据dual truest的策略,因为是针对日线级别的.同理我们可以根据60分钟级别开发出一套策略,等有时间写在下面

  7. Python socket编程之四:模拟分时图

    建立 socket,先运行服务器,再运行客户端,建立连接后服务器从本地数据库调数据一截一截地发送给客户端,客户端接受数据绘图模拟分时图 1.socket # -*- coding: utf-8 -*- ...

  8. HDU2546(01背包饭卡)

    电子科大本部食堂的饭卡有一种很诡异的设计,即在购买之前判断余额.如果购买一个商品之前,卡上的剩余金额大于或等于5元,就一定可以购买成功(即使购买后卡上余额为负),否则无法购买(即使金额足够).所以大家 ...

  9. 鸟哥的linux私房菜---非常好的linux基础网址【转】

    转自:http://linux.vbird.org/linux_basic/0320bash.php 在 Linux 的環境下,如果你不懂 bash 是什麼,那麼其他的東西就不用學了!因為前面幾章我們 ...

  10. 圆角button

    方案1: <Window.Resources> <ControlTemplate x:Key="CornerButton" TargetType="{x ...