Ranking of energy consumption objects using the principal components method

- Ensuring comfortable conditions in civil buildings requires the implementation of tasks of monitoring and forecasting the cost of energy resources, as well as energy-efficient management of heating engineering systems and its equipment. The implementation of appropriate automation and monitoring solutions allows the accumulation of a significant amount of data. To increase the informativeness of the analysis of energy efficiency in the operation of civil buildings a model of their information ranking was developed using correlation analysis and the principal component analysis. Based on the interdisciplinary methodology of data analysis (CRISP-DM), the basic indicators were determined for the accepted initial conditions on electricity and heat consumption of the university buildings and the matrix of correlation coefficients of their interrelation was estimated. Certain data (external volume and area of the building and average temperature values for this region according to the norm) are obtained from the technical documentation of buildings and available from open sources, others (amount of consumed heat and electricity, indoor temperature) are determined during operation and characterize the efficiency of energy resources in the building. At the initial stage, a correlation analysis of the relationship between the main parameters that characterize buildings and their consumption of energy resources. The principal component analysis was used to reduce the dimensionality of the feature set of data and to identify homogeneous groups of energy consumption objects. The obtained four components explain about 90% of the variance of the initial data and characterize the efficiency of energy use in terms of temperature, volume and coefficient of heating degree days of the heating season. The obtained results are recommended for implementation in modern systems of energy monitoring and municipal energy management as applied models for diagnosing abnormal situations and sound management decisions.


INTRODUCTION
The development of information technology has led to an increase in the number of machine learning methods application areas and methodologies of its application [1][2][3][4]. These include managing the energy consumption of civil buildings and forecasting energy costs to ensure comfortable conditions. Machine learning methods help in the energy management systems parameters analyzing and support effective decision-making by energy managers [5].
There is a need to improve existing or find new approaches of energy consumption data analysis, in order to make decisions aimed at improving energy efficiency. Implementation of solutions for heating systems automated monitoring and control allows to reduce the total heat consumption and accumulate large amounts of information about the operation of the system and the decisions made by the energy manager. Tasks related to the need to ensure regulatory sanitary and hygienic conditions in heated rooms, taking into account the influence of external factors [6,7].

II. EXISTING SOLUTIONS ANALYSIS
The implementation of solutions aimed at controlling and monitoring energy consumption allows to accumulate significant amounts of information about the amount of resource consumed [4,5]. The availability of meter readings gives an idea about heat consumption, which can affect management strategies, such as the allocation of nonobvious patterns of consumption of communal or any buildings -information that can not be obtained without the use of machine learning methods. The main problem you may face is the complexity, and often the inability, to obtain reliable and detailed data.
In previous works [8,9] the structure of information flows for obtaining data from available sources was developed. Features of data depending on type and functional features of buildings are established. The study of the data structure confirmed the need to create additional features for the analysis of energy consumption of buildings in the form of specific heat consumption, taking into account outdoor and indoor air temperatures, energy efficiency class and tariffs per unit of heating area. Taking into account the complex structure of data on energy consumption of buildings, a method of initial analysis based on the intersectoral methodology CRISP-DM [4] has been developed. The conducted approbation of the methodology on the data of educational buildings of KrNU allows to state that this algorithm can be used for preliminary data analysis on the basis of Data Mining tools in the future.

III. PURPOSE OF THE WORK
Designing new features to compare the energy consumption of buildings leads to an increase in the dimensionality of the data and, consequently, an increase in the time to perform various actions on the data, which can be critical for larger samples over long periods of time. Thus, the aim of the study is to develop and test a model of information ranking of buildings according to their energy consumption using the method of principal components.

IV. RESEARCH MATERIALS
Three data sets were collected for this study: -Energy consumption of educational buildings for the period from 2012 to 2016, indicating the building number, month, year, heat consumption (expressed in Gcal), electricity consumption (expressed in kWh). Number of data: columns -5, rows -420. Data format: case numbercategorical variable, the rest of the data -numerical.
-Volumes of heat load, indicating the building number, heat load volumes (expressed in Gcal / h) and building volume. Number of data: columns -3, rows -7. Data format: case numbercategorical variable, the rest of the data -numerical.
-Average monthly ambient temperature for the period from 2012 to 2016, indicating the year, month and average monthly temperature. Number of data: columns -3, rows -60. Data format: all datanumerical.
For comparative analysis of energy consumption of different buildings, the same indicators were used, for example, specific heat consumption (q1, kWh/m 3 ), specific electricity consumption (q2, kWh/m 3 ), etc.
Normative value of degree days of the heating period (GDOP): where Ddthe actual number of degree days; in_C Т , °Сinternal room temperature; out_C Т , °Сaverage actual outdoor temperature; Zthe actual duration of the heating period.
HDD coefficient: where Dd K -HDD coefficient; Ddthe actual number of degree-days; n Ddnormalized number of degrees-days.
Absolute heat consumption in kWh: where E_kWhabsolute heat consumption, in kWh; E_Gkalthermal energy consumption, Gcal.
Specific heat consumption, reduced to the normative values of external and internal air temperature: where q1tspecific heat consumption taking into account temperatures, kWh /m 3 ; q1specific heat consumption, kWh /m 3 ; Dd K -HDD coefficient. Specific power consumption, reduced to the normative values of external and internal air temperature: where q2tspecific power consumption taking into account temperatures, kWh /m 3 ; q2specific electricity consumption, kWh /m 3 ; Dd K -HDD coefficient.
Specific total energy consumption, reduced to the normative values of external and internal air temperature: where q3tspecific total energy consumption taking into account temperatures, kWh /m 3 ; q3specific total energy consumption, kWh /m 3 ; Dd K -HDD coefficient.
The selected group of characteristics can be divided into two categories: a priori -obtained from the technical documentation of buildings, or available from open sources (external volume of the building and average temperature for this region by norm), and posteriori -obtained during operation of the building and actually characterize energy consumption of the building (amount of consumed heat and electricity, internal temperature in the room).
Preliminary correlation analysis is performed and an estimate of the matrix of correlation coefficients is obtained (Fig. 1). The obtained matrix of estimates of correlation coefficients is taken as a basis for further research. Based on the nature of the data, it is advisable to use methods of correlation analysis and principal component analysis (PCA) to reduce the dimensionality of informative features and identify homogeneous groups of energy consumption objects using cluster analysis.
The purpose of PCA is to find linear combinations of variables containing the largest variance [10]. The linear combination has the following form: where covariance (correlation) matrix.
The variance of the second and subsequent principal components is calculated similarly.
The total variance of the sample is equal to: Performing the PCA procedure allows to get the eigenvalues i  of the main components and the percentage of variance that they explain (Table I).   As can be seen from fig. 3, the first main component is "loaded" by variables related to temperature and variables that characterize the absolute, specific and reduced to the normative values of external and internal air temperature heat consumption and total absolute energy consumption. Moreover, the analysis of factor loads shows that, with the internal and external environment temperature increase, the amount of heat consumption decreases respectively. This component should be called "Energy efficiency of the building by temperature", i.e. those structures are more energy efficient, which when varying the temperature have less variation in total energy consumption and heat consumption. The second main component (Fig. 4) is related to the volume of the building, the amount of heat load, as well as the indicators of specific, reduced to the normative values of outdoor and indoor air temperature, electricity consumption and total energy consumption, specific electricity consumption. It can be called "Energy efficiency of a building by volume", i.e. those structures are more energy efficient that with the same volume and volume of heat load have less variation in this direction. The third main component (Fig. 5) is formed by the values of the coefficient of degrees-days, the volume of the building, the volume of heat load, as well as specific indicators, reduced to the normative values of outdoor and indoor air temperature and total energy consumption. Since the second and third components contain the same variables that significantly affect the percentage of the described variance, we can say that this phenomenon is caused by an increase in the dimensionality of the data by creating parameters that depend on the volume and value of the degree-day ratio.
The fourth main component (Fig. 6) is specific and is formed by the values of absolute and specific power consumption.   The convergence of the winter period segments centers to the origin indicates an increase in electricity consumption during the heating period and transition seasons, which may be caused by the active use of climate technology.

V. CONCLUSIONS
Methods of correlation analysis and PCA were used to reduce the dimensionality of informative features. Four main components describing about 90 % of the sample variance were obtained.
The first main component is the variables related to temperature and the variables that characterize the absolute, specific and reduced to the normative values of outdoor and indoor air temperature heat consumption and total absolute energy consumption -"Energy efficiency of the building by temperature".
The second main component is the volume of the building, the amount of heat load, the specific, reduced to the normative values of outdoor and indoor air temperature, electricity consumption and total energy consumption, the specific electricity consumption -"Energy efficiency of the building by volume".
The third main component is the degree-day ratio, the volume of the building, the amount of heat load, the specifics, reduced to the normative values of outdoor and indoor air temperature of electricity consumption and total energy consumption.
We can conclude that the hit of the same variables to the second and third main components is due to the artificial creation of parameters from the original, which in turn caused an increase in the dimensionality of the data. To exclude this anomaly, at the stage of studying the correlation matrix it is necessary to exclude variables that are strongly correlated with each other. The fourth main component depends solely on the values of absolute and specific power consumption.
The application of cluster analysis allowed to identify homogeneous groups of energy-consuming objects by selected components.