论文部分内容阅读
In recent years,the new technological and data science achievements allowed the rapid growth of large-scale data.A typical example is the smart grid streaming data that are produced by the industrial smart energy meters.In the field of electricity management,a critical task is to use such large-scale data to obtain information about the segmentation of the different types of factories and keep track of this information over time.This information is required for determining the strategies to cope with customer demands and improve the efficiency of management in electricity distribution. This dissertation addresses three tasks involved in the analysis of load profile data.The load profile data of a factory is a sequence of electricity consumption measurements taken at specified time intervals,representing the load profile of the factory in a given period.A set of load profiles are represented as a data matrix where each row is a sequence of measurements of a factory,and each column is a set of measurements collected within the particular time slot from all factories.The first task is the segmentation of factories using one-day data matrix to define the business process operations executed on a daily basis.We use a new feature selection technique that removes irrelevant dimensions from the data matrix by using the local and temporal densities in the load profiles.Then,we use data visualization to estimate the number of clusters,and apply the well-known k-means algorithm for cluster patterns.In the second task,we perform the segmentation of factories using a limited amount of operational memory and only a single processing pass over the continuous electricity consumption data.We vertically divided the data matrix into a sequence of sub-matrices,each one representing the load profiles of factories in a time window.We use a gamma mixture model to suppress the influence of sparse data units from the arrived data into first window,and apply the kmeans algorithm on processed data of that window.The same process repeats for the arrived data of the next window.Then,we use a reinforcement learning based ensemble clustering technique to aggregate the obtained clusterings of two windows.We continue the process of ensemble clustering between the aggregated clustering from the previous windows and the one from the current window.In the third task,we propose a novel algorithm for tracking the change of patterns in load profile data of factories over a sequence of time windows.We cluster the load profiles in each time window and use the clusters to model the electricity consumption patterns.For each window,we use a hierarchical binary k-means algorithm to generate component clusterings and a new objective function to ensemble them and produce the final clustering.Then,tracking the change of electricity consumption patterns along time windows is achieved by using a new change detection method.This method detects the change of clusters from one window to the next one by using the distribution models of two related clusters in two consecutive windows. In our experiments,we used real-world load profile data.It contains more than 20000 load profiles collected from manufacturing industries in Guangdong province of China for a period of one year.The electricity consumption measurements were collected at a 15-minute interval.The results have shown that the proposed approaches outper form several state-of-the-art approaches.The obtained results effectively achieved the demands of the load profile data applications.