地理科学 ›› 2022, Vol. 42 ›› Issue (7): 1146-1154.doi: 10.13249/j.cnki.sgs.2022.07.002

• • 上一篇    下一篇

基于HBase的矢量空间数据存储与查询方法及其应用

李朝奎1,2(), 王露瑶1,2, 周新邵3,*(), 唐炉亮1,2, 张新长1,2, 李扬1,2   

  1. 1.湖南科技大学地理空间信息技术国家地方联合工程实验室,湖南 湘潭 411201
    2.湖南科技大学测绘遥感信息工程湖南省重点实验室,湖南 湘潭 411201
    3.湖南城市学院信息与电子工程学院,湖南 益阳 413000
  • 收稿日期:2021-06-21 修回日期:2022-01-07 出版日期:2022-07-10 发布日期:2022-09-07
  • 通讯作者: 周新邵 E-mail:chkl_hn@163.com;zhouxinshao@hncu.edu.cn
  • 作者简介:李朝奎(1967−),男,湖南汉寿人,博士,教授,博导,研究方向为三维地理建模及其应用。E-mail: chkl_hn@163.com
  • 基金资助:
    国家自然科学基金(42171418);湖南省自然科学基金创新群体项目(2020JJ1003);湖南省高新技术产业科技创新引领计划项目(2021GK4001)

Design and Application of Storage and Query Algorithm for Vector Spatial Data Based on HBase

Li Chaokui1,2(), Wang Luyao1,2, Zhou Xinshao3,*(), Tang Luliang1,2, Zhang Xinchang1,2, Li Yang1,2   

  1. 1. Hunan Province Key Laboratory of Surveying & Mapping, Remote Sensing and Geoinformation, Hunan University of Science and Technology, Xiangtan 411201, Hunan, China
    2. National-Local Joint Engineering Laboratory of Geo-Spatial Information Technology, Hunan University of Science and Technology, Xiangtan 411201, Hunan, China
    3. School of Information and Electronic Engineering, Hunan City University, Yiyang 413000, Hunan, China
  • Received:2021-06-21 Revised:2022-01-07 Online:2022-07-10 Published:2022-09-07
  • Contact: Zhou Xinshao E-mail:chkl_hn@163.com;zhouxinshao@hncu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(42171418);Innovation Group Project of Hunan Natural Science Foundation(2020JJ1003);Scientific and Technological Innovation Leading Plan Project of Hunan High-tech Industry(2021GK4001)

摘要:

研究了HBase存储机制,针对现有存储查询方法效率低等缺陷,设计了HBase矢量空间数据存储表模式,如行键、过滤列族、几何列族及非几何列族等,以MapReduce算法为基础改进了原有的区域查询方法,上述改进有效提高了HBase中矢量空间数据查询效率。以某地近100 a地质灾害数据进行实验,结果表明:设计的存储模型可行,查询算法与传统查询算法相比效率更高;由于MapReduce运行过程中的通信等原因,当数据量小于5万级时,算法优势并不明显;当数据量大于10万级时,算法查询时间低于原来的1/2,而数据量达到100万级时,算法查询时间仅为算法改进前查询时间的1/20。数据量越大,并行化处理优势越明显。

关键词: HBase, 矢量空间数据, 存储与查询, MapReduce, 算法设计

Abstract:

Based on the research of HBase storage mechanism, this article aims at the low efficiency of existing storage query methods, the HBase spatial data storage table patterns such as row key, filter column family, geometric column family and non-geometric column family are designed, and the original region query method is improved based on MapReduce algorithm, the above improvements effectively improve the query efficiency of vector spatial data in HBase. The experiment was carried out with the data of geological hazards in recent 100 years. The results show that the storage model designed in this article is feasible, and the query algorithm is more efficient than the traditional query algorithm. Due to the communication in the process of MapReduce, when the amount of data is less than 50 000 byte, the advantages of this algorithm are not obvious; When the amount of data is more than 100 000 byte, the query time of this algorithm is less than 1/2 of the original; The query time of the algorithm is only 1/20th of that before the improvement of the algorithm when the amount of data reaches 1 million byte. The greater the amount of data, the more obvious the advantages of parallel processing.

Key words: HBase, vector spatial data, storage and query, MapReduce, algorithm design

中图分类号: 

  • P208