地理科学 ›› 2022, Vol. 42 ›› Issue (7): 1146-1154.doi: 10.13249/j.cnki.sgs.2022.07.002
李朝奎1,2(), 王露瑶1,2, 周新邵3,*(
), 唐炉亮1,2, 张新长1,2, 李扬1,2
收稿日期:
2021-06-21
修回日期:
2022-01-07
出版日期:
2022-07-10
发布日期:
2022-09-07
通讯作者:
周新邵
E-mail:chkl_hn@163.com;zhouxinshao@hncu.edu.cn
作者简介:
李朝奎(1967−),男,湖南汉寿人,博士,教授,博导,研究方向为三维地理建模及其应用。E-mail: chkl_hn@163.com
基金资助:
Li Chaokui1,2(), Wang Luyao1,2, Zhou Xinshao3,*(
), Tang Luliang1,2, Zhang Xinchang1,2, Li Yang1,2
Received:
2021-06-21
Revised:
2022-01-07
Online:
2022-07-10
Published:
2022-09-07
Contact:
Zhou Xinshao
E-mail:chkl_hn@163.com;zhouxinshao@hncu.edu.cn
Supported by:
摘要:
研究了HBase存储机制,针对现有存储查询方法效率低等缺陷,设计了HBase矢量空间数据存储表模式,如行键、过滤列族、几何列族及非几何列族等,以MapReduce算法为基础改进了原有的区域查询方法,上述改进有效提高了HBase中矢量空间数据查询效率。以某地近100 a地质灾害数据进行实验,结果表明:设计的存储模型可行,查询算法与传统查询算法相比效率更高;由于MapReduce运行过程中的通信等原因,当数据量小于5万级时,算法优势并不明显;当数据量大于10万级时,算法查询时间低于原来的1/2,而数据量达到100万级时,算法查询时间仅为算法改进前查询时间的1/20。数据量越大,并行化处理优势越明显。
中图分类号:
李朝奎, 王露瑶, 周新邵, 唐炉亮, 张新长, 李扬. 基于HBase的矢量空间数据存储与查询方法及其应用[J]. 地理科学, 2022, 42(7): 1146-1154.
Li Chaokui, Wang Luyao, Zhou Xinshao, Tang Luliang, Zhang Xinchang, Li Yang. Design and Application of Storage and Query Algorithm for Vector Spatial Data Based on HBase[J]. SCIENTIA GEOGRAPHICA SINICA, 2022, 42(7): 1146-1154.
表3
查询窗口区域信息与实验结果
矩形窗口 编号 | (Minx, Miny) | (Maxx, Maxy) | 矩形窗口 面积/km2 | 查询区域内 空间对象个数 | RecQuery 运行时间/s | MRv2RecQuery 运行时间/s |
1 | (123.064,39.470) | (123.964,38.882) | 6520.270 | 1587 | 10.264 | 14.957 |
2 | (122.613,39.764) | (124.415,38.588) | 25364.190 | 6952 | 27.853 | 17.032 |
3 | (122.163,40.058) | (124.865,38.294) | 58002.910 | 30975 | 59.079 | 24.928 |
4 | (121.712,40.352) | (125.316,38.010) | 80820.257 | 117539 | 168.316 | 36.712 |
5 | (121.262,40.646) | (125.766,37.706) | 126792.704 | 450984 | 502.264 | 56.848 |
6 | (120.811,40.94) | (126.216,37.412) | 185140.902 | 1240587 | 945.056 | 75.173 |
表4
正五边形查询区域设置
查询区域编号 | 正五边形查询 区域坐标串 | 查询区域 面积/km2 | 查询区域内 空间对象个数 | PlyQuery算法 运行时间/s | MRv2PlyQuery 算法运行时间/s |
1 | (123.514,39.626)(123.124,39.401)(123.289,38.786) (123.739,38.786)(123.904,39.011)(123.514,39.626) | 2706.329 | 748 | 6.083 | 15.175 |
2 | (123.514,40.067)(122.734,39.623)(123.065,38.403) (123.946,38.403)(124.291,39.623) (123.514,40.067) | 21650.635 | 7163 | 25.215 | 17.783 |
3 | (123.514,40.512)(122.344,39.855)(122.842,38.012) (123.148,38.012)(124.682,39.855)(123.514,40.512) | 48713.929 | 32941 | 64.921 | 26.042 |
4 | (123.514,40.937)(121.954,40.079)(122.619,37.642) (123.373,37.642)(125.073,40.079)(123.514,40.937) | 86602.540 | 127362 | 173.457 | 39.184 |
5 | (123.514,41.462)(121.564,40.302)(122.396,37.285) (123.597,37.285)(125.469,40.302)(123.514,41.462) | 135316.469 | 487365 | 491.302 | 58.538 |
6 | (123.514,41.854)(121.174,40.524)(122.172,36.876) (123.824,36.876)(125.854,40.524)(123.514,41.854) | 194855.716 | 1438176 | 1037.746 | 82.091 |
[1] |
魏世轩. 基于Cesium的空间数据展示与查询关键技术研究[J]. 城市勘测, 2021(3): 5-8.
doi: 10.3969/j.issn.1672-8262.2021.03.001 |
Wei Shixuan. Research on key technologies of spatial data display and query based on Cesium. Urban Geotechnical Investigation & Surveying, 2021(3): 5-8.
doi: 10.3969/j.issn.1672-8262.2021.03.001 |
|
[2] | 陈俊欣. 基于Hadoop的空间矢量数据的分布式存储与查询研究[D]. 成都: 电子科技大学, 2016. |
Chen Junxin. Research on distributed storage and query of spatial vector data based on Hadoop. Chengdu: University of Electronic Science and Technology of China, 2016. | |
[3] | 丁琛. 基于HBase的空间数据分布式存储和研究[D]. 南京: 南京师范大学计算机科学与技术学院, 2014. |
Ding Chen. Distributed storage and research of spatial data based on HBase. Nanjing: Nanjing Normal University, 2014. | |
[4] |
郑坤, 付艳丽. 基于HBase和GeoTools的矢量空间数据存储模型研究[J]. 计算机应用与软件, 2015, 32(3): 23-26.
doi: 10.3969/j.issn.1000-386x.2015.03.007 |
Zheng Kun, Fu Yanli. Research on vector spatial data storage model based on HBase and GeoTools. Computer Applications and Software, 2015, 32(3): 23-26.
doi: 10.3969/j.issn.1000-386x.2015.03.007 |
|
[5] |
张叶, 许国艳, 华青. 基于HBase的矢量空间数据存储与访问优化[J]. 计算机应用, 2015, 35(11): 3102-3105.
doi: 10.11772/j.issn.1001-9081.2015.11.3102 |
Zhang Ye, Xu Guoyan, Hua Qing. Storage and access optimization of vector spatial data based on HBase. Journal of Computer Applications, 2015, 35(11): 3102-3105.
doi: 10.11772/j.issn.1001-9081.2015.11.3102 |
|
[6] | 荀亚玲, 张继福, 秦啸. MapReduce集群环境下的数据放置策略[J]. 软件学报, 2015, 26(8): 2056-2073. |
Xun Yaling, Zhang Jifu, Qin Xiao. Data placement strategy in MapReduce cluster environment. Journal of Software, 2015, 26(8): 2056-2073. | |
[7] | 范建永, 龙明, 熊伟. 基于HBase的矢量空间数据分布式存储研究[J]. 地理与地理信息科学, 2012, 28(5): 39-42. |
Fan Jianyong, Long Ming, Xiong Wei. Research on distributed storage of vector spatial data based on HBase. Geography and Geo-Information Science, 2012, 28(5): 39-42. | |
[8] |
谢鹏, 杨春成, 熊顺, 等. 基于HBase的空间矢量数据存储模型设计与优化[J]. 测绘学报, 2020, 49(10): 1365-1373.
doi: 10.11947/j.AGCS.2020.20190525 |
Xie Peng, Yang Chuncheng, Xiong Shun et al. Design and optimization of spatial vector data storage model based on HBase. Acta Geodaetica et Cartographica Sinica, 2020, 49(10): 1365-1373.
doi: 10.11947/j.AGCS.2020.20190525 |
|
[9] |
梅华威, 米增强, 吴广磊. 基于MapReduce模型的间歇性能源海量数据处理技术[J]. 电力系统自动化, 2014, 38(15): 76-80.
doi: 10.7500/AEPS20131215006 |
Mei Huawei, Mi Zengqiang, Wu Guanglei. Intermittent energy massive data processing technology based on MapReduce model. Automation of Electric Power Systems, 2014, 38(15): 76-80.
doi: 10.7500/AEPS20131215006 |
|
[10] |
Chen J L, Lee H J. An efficient algorithm for form structure extraction using strip projection[J]. Pattern Recognition, 1998, 31(9): 1353-1368.
doi: 10.1016/S0031-3203(97)00156-8 |
[11] | 鲁伟明, 杜晨阳, 魏宝刚, 等. 基于MapReduce的分布式近邻传播聚类算法[J]. 计算机研究与发展, 2012, 49(8): 1762-1772. |
Lu Weiming, Du Chenyang, Wei Baogang et al. Distributed neighbor propagation clustering algorithm based on MapReduce. Journal of Computer Research and Development, 2012, 49(8): 1762-1772. | |
[12] |
Fan K C, Lu J M, Wang L S et al. Extraction of characters from form documents by feature point clustering[J]. Pattern Recognition, 1995, 16(9): 963-970.
doi: 10.1016/0167-8655(95)00040-N |
[13] | 董西成. Hadoop技术内幕[M]. 北京: 机械工业出版社, 2013. |
Dong Xicheng. Hadoop technology insider. Beijing: Machinery Industry Press, 2013. | |
[14] | 李杨, 朱月琴, 李朝奎, 等. 面向海量地质文档表格信息快速抽取方法研究[J]. 中国矿业, 2017, 26(9): 98-103. |
Li Yang, Zhu Yueqin, Li Chaokui, et al. Research on fast extraction method for massive geological document table information. China Mining Magazine, 2017, 26(9): 98-103. | |
[15] |
Verma A, Cho B, Zea N et al. Breaking the MapReduce stage barrier[J]. Cluster Computing, 2013, 16(1): 191-206.
doi: 10.1007/s10586-011-0182-7 |
[16] | Yang H, Dasdan A, Hsiao R L et al. Map-reduce-merge: Simplified relational data processing on large clusters// Proceedings of the 2007 ACM SIGMOD international conference on management of data[J]. ACM, 2007: 1029-1040. |
[17] | Zhao Y, Wu J. Dache: A data aware caching for big-data applications using the MapReduce framework// 2013 Proceedings IEEE INFOCOM[J]. IEEE, 2013: 35-39. |
[18] |
赵辉, 杨树强, 陈志坤, 等. 基于MapReduce模型的范围查询分析优化技术研究[J]. 计算机研究与发展, 2014, 51(3): 606-617.
doi: 10.7544/issn1000-1239.2014.20130826 |
Zhao Hui, Yang Shuqiang, Chen Zhikun et al. Research on range query analysis and optimization technology based on MapReduce Model. Journal of Computer Research and Development, 2014, 51(3): 606-617.
doi: 10.7544/issn1000-1239.2014.20130826 |
|
[19] | 戴健, 丁治明. 基于MapReduce快速kNNJoin方法[J]. 计算机学报, 2015, 38(1): 99-108. |
Dai Jian, Ding Zhiming. Fast kNNJoin method based on MapReduce. Chinese Journal of Computers, 2015, 38(1): 99-108. | |
[20] | Evans E. Domain-driven design: Tackling complexity in the heart of software [M]. Boston: Addison-Wesley Professional, 2004. |
[21] |
李杨, 李朝奎, 方军, 等. 基于数据级任务分解的大范围地质灾害预警并行计算架构设计与应用[J]. 地理信息世界, 2018, 25(2): 97-101.
doi: 10.3969/j.issn.1672-1586.2018.02.018 |
Li Yang, Li Chaokui, Fang Jun et al. Design and application of parallel computing architecture for large-scale geological disaster early warning based on data-level task decomposition. Geomatics World, 2018, 25(2): 97-101.
doi: 10.3969/j.issn.1672-1586.2018.02.018 |
|
[22] | 左群超, 叶亚琴, 文辉. 中国矿产资源潜力评价集成数据库模型[J].中国地质, 2013,40(6):1968-1981. |
Zuo Qunchao, Ye Yaqin, Wenhui. The integrated database model for mineral resources potential evaluation in China. Geology in China, 2013, 40(6): 1968-1981. |
[1] | 左秀玲, 苏奋振, 王琦, 王晨亮, 蒋会平, 石伟. 全球变化下中国南海诸岛珊瑚礁热压力临时避难所研究[J]. 地理科学, 2020, 40(5): 814-822. |
[2] | 方嘉良, 李卫红. 系列案犯罪地理目标模型优化[J]. 地理科学, 2018, 38(8): 1210-1217. |
[3] | 龙晓君, 李小建. 基于多源数据的中国地形海拔分级指标调整研究[J]. 地理科学, 2017, 37(10): 1577-1584. |
[4] | 李朝奎, 方军, 殷智慧, 张新长, 李慧婷. 基于增量段式的地形数据存储模型设计及算法实现[J]. 地理科学, 2016, 36(12): 1929-1936. |
[5] | 王结臣, 卢敏, 苑振宇, 芮一康, 钱天陆. 基于Ripley’s K函数的南京市ATM网点空间分布模式研究[J]. 地理科学, 2016, 36(12): 1843-1849. |
[6] | 赵元, 胡月明, 张新长, 王璐, 陈飞香, 赵之重. 农村居民点耕作距离空间分布特征估测分析[J]. 地理科学, 2016, 36(5): 760-765. |
[7] | 王新刚, 孔云峰. 基于时空窗口改进的时空加权回归分析——以湖北省黄石市住房价格为例[J]. 地理科学, 2015, 35(5): 615-621. |
[8] | 姚丽, 谷国锋. 吉林省区域经济空间一体化的生态环境响应演变及其影响因素[J]. 地理科学, 2014, 34(4): 464-471. |
[9] | 刘鹏程, 龚冲亚, 陶建斌, 赵晓雪. 基于图形渐变技术的等高线连续尺度表达模型[J]. 地理科学, 2014, 34(3): 332-337. |
[10] | 谭雪兰, 钟艳英, 段建南, 曹浩成. 快速城市化进程中农村居民点用地变化及驱动力研究 ——以长株潭城市群为例[J]. 地理科学, 2014, 34(3): 309-315. |
[11] | 黄翌, 李陈, 欧向军, 汪云甲, 李效顺. 城际“1小时交通圈”地学定量研究——以上海主城区为例[J]. 地理科学, 2013, 33(2): 157-166. |
[12] | 单勇兵, 马晓冬, 仇方道. 苏中地区乡村聚落的格局特征及类型划分[J]. 地理科学, 2012, 32(11): 1340-1347. |
[13] | 米学军, 盛广铭, 张, 婧, 白焕新, 侯伟. GIS中面积偏差控制下的矢量数据压缩算法[J]. 地理科学, 2012, 32(10): 1236-1240. |
[14] | 杨林, 裴安平, 郭宁宁, 梁博毅. 洛阳地区史前聚落遗址空间形态研究[J]. 地理科学, 2012, 32(8): 993-999. |
[15] | 夏既胜, 何文通, 李虹霖. 区域物质流通虚拟网络研究——以金沙江流域中段为例[J]. 地理科学, 2012, 32(7): 816-821. |
|