这篇随笔是对Wikipedia上k-d tree词条的摘录, 我认为解释得相当生动详细, 是一篇不可多得的好文.


Overview

A \(k\)-d tree (short for \(k\)-dimensional tree) is a binary space-partitioning tree for organizing points in a \(k\)-dimensional space. \(k\)-d trees are a useful data structure for searches involving a multidimensional search key.

Construction

The canonical method of \(k\)-d tree construction has the following constraints:

  • As one moves down the tree, one cycles through the axes used to select the splitting planes.
  • Points are inserted by selecting the median of the points being put into the subtree, with respect to their coordinates in the axis being used to create the splitting plane.

This method leads to a balanced \(k\)-d tree, in which each leaf node is approximately the same distance from the root. However, balanced trees are not necessarily optimal for all applications.

Nearest Neighboring Search

Terms:

  • the split dimensions
  • the splitting (hyper)plane
  • "current best"

The nearest neighbor (NN) search algorithm aims to find the point in the tree that is nearest to a given point. This search can be done efficiently by using the tree properties to quickly eliminate large portions of the search space.

Searching for a nearest neighbor in a \(k\)-d tree proceeds as follows:

  1. Starting with the root node, the algorithm moves down the tree recursively.
  2. Once the algorithm reaches a leaf node, it saves that node point as "current best"
  3. The algorithm unwinds the recursion of the tree, performing the following steps at each node:
  4. If the current node is closer than the current best, then it becomes the current best.
  5. The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane with a hypersphere around the the search point that has a radius equal to the current nearest distance. Since the hyperplanes are all axis-aligned this is implemented as a simple comparison to see whether the distance between the splitting coordinate of the search point and current node is less than the distance (overall coordinates) from the search point to the current best.
    1. If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search.
    2. If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.

Generally the algorithm uses squared distances for comparison to avoid computing square roots. Additionally, it can save computation by holding the squared current best distance in a variable for computation.

The algorithm can be extended in several ways by simple modifications. If can provide the $k $ nearest neighbors to a point by maintaining \(k\) current bests instead of just one. A branch is only eliminated when \(k\) points have been found and the branch cannot have points closer than any of the \(k\) current bests.


Implementation

\(k\)近临 (\(k\)NN)

#include <bits/stdc++.h>
#define lson id<<1
#define rson id<<1|1
#define sqr(x) (x)*(x)
using namespace std;
using LL=long long;
const int N=5e4+5;

// K-D tree: a special case of binary space partitioning trees
int DIM, idx;
struct Node{
    int key[5];
    bool operator<(const Node &rhs)const{
        return key[idx]<rhs.key[idx];
    }
    void read(){
        for(int i=0; i<DIM; i++)
            scanf("%d", key+i);
    }
    LL dis2(const Node &rhs)const{
        LL res=0;
        for(int i=0; i<DIM; i++)
            res+=sqr(key[i]-rhs.key[i]);
        return res;
    }
    void out(){
        for(int i=0; i<DIM; i++)
            printf("%d%c", key[i], i==DIM-1?'\n':' ');
    }
}p[N];

Node a[N<<2];   // K-D tree
bool f[N<<2];

// [l, r)
void build(int id, int l, int r, int dep){
    if(l==r) return;    // error-prone
    f[id]=true, f[lson]=f[rson]=false;
    // select axis based on depth so that axis cycles through all valid values
    idx=dep%DIM;
    int mid=l+r>>1;
    // sort point list and choose median as pivot element
    nth_element(p+l, p+mid, p+r);
    a[id]=p[mid];
    build(lson, l, mid, dep+1);
    build(rson, mid+1, r, dep+1);
}

using P=pair<LL,Node>;
priority_queue<P> que;
// multidimensional search key

void query(const Node &p, int id, int m, int dep){
    int dim=dep%DIM;
    int x=lson, y=rson;
    // left: <, right >=
    if(p.key[dim]>=a[id].key[dim])
        swap(x, y);

    if(f[x]) query(p, x, m, dep+1);

    P cur{p.dis2(a[id]), a[id]};

    if(que.size()<m){
        que.push(cur);
    }
    else if(cur.first<que.top().first){
        que.pop();
        que.push(cur);
    }
    if(f[y] && sqr(a[id].key[dim]-p.key[dim])<que.top().first)
        query(p, y, m, dep+1);
}

说明:

  1. bool数组f[], 表示一个完全二叉树中的某个节点是否存在, 也可不用完全二叉树的表示法, 而用两个数组lson[]rson[]表示, 这样的好处还有: 节省空间, 数组可以只开到节点数的2倍.
  2. 区间采用左闭右开表示.

K-D Tree的更多相关文章

  1. AOJ DSL_2_C Range Search (kD Tree)

    Range Search (kD Tree) The range search problem consists of a set of attributed records S to determi ...

  2. Size Balance Tree(SBT模板整理)

    /* * tree[x].left 表示以 x 为节点的左儿子 * tree[x].right 表示以 x 为节点的右儿子 * tree[x].size 表示以 x 为根的节点的个数(大小) */ s ...

  3. HDU3333 Turing Tree(线段树)

    题目 Source http://acm.hdu.edu.cn/showproblem.php?pid=3333 Description After inventing Turing Tree, 3x ...

  4. POJ 3321 Apple Tree(树状数组)

                                                              Apple Tree Time Limit: 2000MS   Memory Lim ...

  5. CF 161D Distance in Tree 树形DP

    一棵树,边长都是1,问这棵树有多少点对的距离刚好为k 令tree(i)表示以i为根的子树 dp[i][j][1]:在tree(i)中,经过节点i,长度为j,其中一个端点为i的路径的个数dp[i][j] ...

  6. Segment Tree 扫描线 分类: ACM TYPE 2014-08-29 13:08 89人阅读 评论(0) 收藏

    #include<iostream> #include<cstdio> #include<algorithm> #define Max 1005 using nam ...

  7. Bzoj 2588: Spoj 10628. Count on a tree 主席树,离散化,可持久,倍增LCA

    题目:http://www.lydsy.com/JudgeOnline/problem.php?id=2588 2588: Spoj 10628. Count on a tree Time Limit ...

  8. Size Balanced Tree(SBT) 模板

    首先是从二叉搜索树开始,一棵二叉搜索树的定义是: 1.这是一棵二叉树: 2.令x为二叉树中某个结点上表示的值,那么其左子树上所有结点的值都要不大于x,其右子树上所有结点的值都要不小于x. 由二叉搜索树 ...

  9. hdu 5274 Dylans loves tree(LCA + 线段树)

    Dylans loves tree Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 131072/131072 K (Java/Othe ...

随机推荐

  1. Redis概述

    1.       Redis是使用内存存储(in-momory)的非关系型数据. 2.       Redis的数据存储选项共有5种:字符串.列表.集合.散列表.有序集合. 3.       Redi ...

  2. 无法将类型为“System.Decimal”的对象强制转换为类型“System.Char[]”。

    在用微软的SSIS操作ORACLE 数据源的时候碰到以下报错信息: [ADO NET Destination [13455]] 错误: 数据插入期间出现异常,从提供程序返回的消息为:无法将类型为&qu ...

  3. Git Day03,GitHub 1st

    1st, SSH key: Add a pic @ Sep 18 2016 20:26 To note the configuration process on Linux: 2nd,github网站 ...

  4. PyQt写的五子棋

    技术路线 GUI的实现 使用PyQt技术作为基础.PyQt是一个支持多平台的客户端开发SDK,使用它实现的客户端可以运行在目前几乎所有主流平台之上. 使用PyQt,Qt设计器实现UI,通过pyuic4 ...

  5. Lambda表达式演变

    Lambda表达式是一种匿名函数.   演变步骤:   一般的方法委托 => 匿名函数委托 => Lambda表达式   Lambda表达式其实并不陌生,他的前生就是匿名函数,所以要谈La ...

  6. Windows Azure Storage图形界面管理工具

    上一篇我们介绍了用PowerShell将Windows Azure的存储服务当网盘来使用.如果感觉还不够简单,那么这次我们来看看还有哪些使用起来更方便的图形界面管理工具吧.当然,这些工具必要支持中国版 ...

  7. Nginx日志切割,以及脚本上传nginx的切割日志

    一:日志切割步骤 命令都在root下进行 1.创建目录 mkdir -p /etc/opt/modules/bin ## 创建文件夹 2.上传cut 3.观察目录 4.修改的cut文件 5.检测 需要 ...

  8. GPS accuracy in Android

    Get the estimated accuracy of this location, in meters. We define accuracy as the radius of 68% conf ...

  9. hdu 2258 优先队列

    Continuous Same Game (1) Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java ...

  10. iOS学习之C语言内存管理

         一.存储区划分      按照地址从高到低的顺序:栈区,堆区,静态区,常量区,代码区    1.栈区:局部变量的存储区域     局部变量基本都在函数.循环.分支中定义     栈区的内存空 ...