面向智能电网流数据分析的集成聚类方法与算法

来源 :中国科学院大学 | 被引量 : 0次 | 上传用户:Cyril
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
In recent years,the new technological and data science achievements allowed the rapid growth of large-scale data.A typical example is the smart grid streaming data that are produced by the industrial smart energy meters.In the field of electricity management,a critical task is to use such large-scale data to obtain information about the segmentation of the different types of factories and keep track of this information over time.This information is required for determining the strategies to cope with customer demands and improve the efficiency of management in electricity distribution.  This dissertation addresses three tasks involved in the analysis of load profile data.The load profile data of a factory is a sequence of electricity consumption measurements taken at specified time intervals,representing the load profile of the factory in a given period.A set of load profiles are represented as a data matrix where each row is a sequence of measurements of a factory,and each column is a set of measurements collected within the particular time slot from all factories.The first task is the segmentation of factories using one-day data matrix to define the business process operations executed on a daily basis.We use a new feature selection technique that removes irrelevant dimensions from the data matrix by using the local and temporal densities in the load profiles.Then,we use data visualization to estimate the number of clusters,and apply the well-known k-means algorithm for cluster patterns.In the second task,we perform the segmentation of factories using a limited amount of operational memory and only a single processing pass over the continuous electricity consumption data.We vertically divided the data matrix into a sequence of sub-matrices,each one representing the load profiles of factories in a time window.We use a gamma mixture model to suppress the influence of sparse data units from the arrived data into first window,and apply the kmeans algorithm on processed data of that window.The same process repeats for the arrived data of the next window.Then,we use a reinforcement learning based ensemble clustering technique to aggregate the obtained clusterings of two windows.We continue the process of ensemble clustering between the aggregated clustering from the previous windows and the one from the current window.In the third task,we propose a novel algorithm for tracking the change of patterns in load profile data of factories over a sequence of time windows.We cluster the load profiles in each time window and use the clusters to model the electricity consumption patterns.For each window,we use a hierarchical binary k-means algorithm to generate component clusterings and a new objective function to ensemble them and produce the final clustering.Then,tracking the change of electricity consumption patterns along time windows is achieved by using a new change detection method.This method detects the change of clusters from one window to the next one by using the distribution models of two related clusters in two consecutive windows.  In our experiments,we used real-world load profile data.It contains more than 20000 load profiles collected from manufacturing industries in Guangdong province of China for a period of one year.The electricity consumption measurements were collected at a 15-minute interval.The results have shown that the proposed approaches outper form several state-of-the-art approaches.The obtained results effectively achieved the demands of the load profile data applications.
其他文献
服装的购买主要分为定制、实体店购买与网络购买。定制服装以及到实体店购买服装,必定要量身剪裁或亲身试衣。无论是设计师抑或消费者,在何种情况下,均对服装是否合体有较高的要
数据获取和控制系统是高能物理及其他各种大型科学实验装置中不可或缺的两个重要系统,而数据获取系统中的读出计算机和控制系统中的前端控制计算机是各自系统中的关键设备。本
心电图在医学和模式识别领域有着很重要的作用,使用计算机对心电信号进行自动诊断对于医学等领域有着很重要的意义。论文提出了一种以心电信号形态特征为基础,基于信号分段,
随着Web服务不断的推广,网络中出现大量功能相同或者相似的Web服务。由于用户地域和网络环境的不同,相同Web服务QoS相对于不同的用户可能会差异很大,因此在众多功能相同的Web
随着人们生活水平的不断提高和信息领域的迅速发展,生物特征领域得到人们越来越多的关注和研究。一方面,生物特征比如人脸由于其普遍性,在娱乐方面的应用得到了广泛关注。儿童照
在计算机图形学中雨景的真实感绘制是一项重要的研究课题。雨的模拟可以大大增加场景的真实感,应用领域非常广泛,包括电影、动画、游戏和虚拟现实等领域。然而雨的真实感绘制是
在机器学习领域,特征选择已成为不可或缺的降维方法,尤其是基因数据,特征的维度从几十维到几万维。维度过高不仅会使模型的泛化能力降低,而且对分类时间复杂度有很大的影响。特征
在线事务处理应用面临着并发量和数据量持续增长的挑战,传统的应用服务器集群技术和分布式缓存技术并不能从根本上缓解高并发读写操作对于后台数据库的压力,管理分布式缓存的存
人格代表了一套源自于个体的、稳定的行为模式及内部加工过程,能对人进行有效的人际区分,在心理学研究中有着重要的意义。最常用的人格测量方法是人工填写量表,但由于这一形式耗
软件成本估算是软件项目计划中至关重要的一个环节。经由估算得出的信息,是指导人力资源的分配,进度节点的设定等计划活动所必需的。过高和过低的估算结果都会对项目计划的执行