Ensembled Elbow and Bray-Curtis Fuzzy C-Means Clustering For Energy Efficient Data Aggregation in WSN

Wireless sensor network (WSN) comprises the distributed sensors for aggregating and organizing the data. Data aggregation is the major concern in WSN since it relies on several factors, namely energy constraints of sensors, network topology, links conditions and so on. The conventional approach does not perform efficient data aggregation due to their battery power of nodes and degrade the network lifetime. To improve data aggregation and network lifetime, An Energy-Efficient Ensembled Elbow Fuzzy C-means Clustering based Data Aggregation (EEEEFCC-DA) method is designed. Initially, residual energy of each sensor node (SN) is calculated. To determine the number of clusters, the elbow method is used in fuzzy c-means clustering algorithm. Then, Centroids value is calculated for every cluster to group SNs. Bray-Curtis Similarity Index is used to compute the similarity between the SN and Centroids value of cluster. SNs are grouped depends on the similarity value. The process gets iterated until every SNs gets clustered to the suitable clusters. After that, the SN with higher residual energy is selected as cluster head (CH). CH gathers data from each SNs and send to sink node. This, assist to enhance the data gathering accuracy and lessen the energy consumption. Simulation of EEEEFCC-DA method is carried out with various metrics namely energy consumption, network lifetime, data aggregation accuracy (DAA) and data aggregation time with number of SNs and number of data packets (DP). Results show that EEEEFCC-DA method provides better performance in term of DAA , network lifetime , energy consumption and data aggregation time than the conventional methods.


Introduction
A WSN comprises sensor devices for gathering and organizing the environmental data. SNs are deployed in network and coordinate with each other to accomplish a certain task. The WSN is applied in various real-time applications. In WSN, SNs energy is a key issue since the node senses a lot of information"s and sends more data. Due to the lesser energy, numbers of inactive nodes increases, and then the entire network fails to contain the sufficient energy resources are batteries, which typically did not contain charging capability. Therefore, energy efficient data aggregation and improving network lifetime are the major issues in WSN. The clustering algorithms are used as one of the solutions for solving the above-said problems. Many algorithms have been developed for data aggregation and enlarging the network lifetime. In [1],A Multi-Mobile Agent Itinerary Planning-Based Energy and Fault Aware data aggregation (MAEF) was introduced. But, the network lifetime was not improved using MAEF. A Sparsest Random Sampling method for cluster-based compressive data gathering (SRS-CCDG) was designed in [2] with lesser energy cost. The designed method was not improved the data gathering accuracy. An Energy-aware Compressive sensing based Data Aggregation (ECDA) model was introduced in [3] to overcome the problem of network lifetime. However, the data aggregation performance was poor. A sparsity feedback-based compressive data aggregation method was designed in [4] for balancing the energy between the nodes. Mixedinteger linear programming (MIP) model was developed in [5] for energy efficient data collection. But the data collection time was not minimized. To conserve node energy and improve the network lifetime, a dynamic mobile agentbased data collection method was presented in [6]. But it failed to enhance the accuracy of data aggregation. In [7], a low delay and highthroughput opportunistic data collection scheme were introduced to improve the DP transmission to sink. The designed scheme was failed to consider the energy-aware data collection. A cluster-ring method was introduced in [8] for energy efficient data gathering and enhancing the lifetime of the network. But the clustering error was not minimized.
A Distributed Data Gathering Approach (DDGA) was developed in [9] for solving the data aggregation problem with a mobile sink. The approach decreases the energy utilization but the accurate Data Gathering was not achieved. A prediction model-based data collection was developed in [10] using CH for improving the collected data"s accuracy with a minimum predefined error. Though the model improves the lifetime of the node, the data collection time was not minimized. The issues of literature are overcome by a novel method called EEEEFCC-DA. The contributions of the proposed EEEEFCC-DA method over the existing techniques are described in the following subsections.

Proposal contribution and structure
The contribution of EEEEFCC-DA method over the state-of-the-art approaches are summarized as follows, • To enhance the accuracy of data aggregation in WSN, EEEEFCC-DA method is developed with the ensemble of elbow method and fuzzy c means clustering. The Elbow method effectively finds the suitable clusters by calculating the summation of the squared error between the clusters and SNs. After finding the clusters, the Centroids is assigned according to the node residual energy level. The Bray-Curtis Similarity Index is used to calculate the relationship between the SN and centroid and then group the SNs. Therefore the higher residual energy nodes are considered for data aggregation. To minimize the data aggregation time, the ensemble clustering method selects the CH. The nodes with higher residual energy within the group are chosen as CH for that particular cluster. After that, the sink node gathers the data from CHs rather than collecting from all the SNs within the group. The structure of article is ordered as follows. Literature survey is reviewed in Section 2. A brief description of EEEEFCC-DA method is presented Section 3. The simulation setup and the parameters settings are presented in section 4. The comparative analysis of proposed method is presented in section 5. Section 6 concludes the paper.

Literature survey
A new energy-efficient data collection approach was introduced in [11] with spatial-temporal correlation for achieving the reasonable accuracy. But the performance of network lifetime was not improved. A probabilistic clustering algorithm was introduced in [12]. To enhance data aggregation performance and lessen the energy consumption, a neural network was designed in [13]. The time taken for efficient data gathering remained unsolved.
A reinforcement learning based clustering algorithm (RLBCA) was developed in [14] to find CHs for collecting the data and send to sink node. But the clustering error was higher. In [15], a spawn multi-mobile agent itinerary planning (SMIP) method was developed to enhance the data gathering processes with minimal energy utilization and time. The performance of network lifetime was not increased. A Multi-Strip Data Gathering (MSDG) method was designed in [16] to lessen the energy utilization of SNs. But the time complexity of data gathering was not minimized. A resilient data aggregation scheme was introduced in [17] with the spatiotemporal correlation. To gather the data from CH, a novel energy-aware and density-based clustering algorithm was designed in [18]. The algorithm enhances the performance of network lifetime and energy usage. In [19], A novel itinerary planning algorithm grouping the mobile node based on the density. The efficient data gathering was not performed with the CHs. For enhancing the data gathering efficiency and stability of energy consumption, A type of data gathering technique with mobile sink was designed in [20]. But the data gathering time was not minimized. The issues of existing are conquer by introducing the EEEEFCC-DA method. Process of the EEEEFCC-DA method with the neat diagram is presented in the next section.

Methodology
To enhance the data aggregation and lessen the energy consumption, EEEEFCC-DA method is introduced. Initially, the clustering based data aggregation algorithm partitions the network into different groups depends on the energy level of sensors. Each group has one director called a CH and a number of cluster members. Each sensor device in clusters transmits their collected data to its CH. Therefore, the CH based data aggregation consumes lesser energy and thus the network durability gets increased.

Network model
The network model of EEEEFCC-DA method is described in this section with the number of spatially distributed SNs. The WSN is organized into the directed graph G_d=(v,e) which comprises the set of vertex (v) and edge (e). The vertex (v) represents the "n" sensor nodes SN_1,SN_2,SN_3,…..SN_n distributed in the squared area N*N. In a graph, "e" is an edge which represents the link i.e. connection between the SNs. Total network is divided into "c" number of clusters C_1,C_2,C_3,…C_c based on node residual energy level 〖( i.e E〗_r) . CH is chosen for every group. SNs gathers the data DP_1,DP_2,DP_3,….DP_(n )from the environmental conditions and transmits to CH. CH transmits collected data to sink node (S_n) where it act as data aggregator. Based on the above said network model, the proposed EEEEFCC-DA method is designed and perform the efficient data aggregation.

Ensembled Elbow method based Fuzzy Cmeans clustering for Data aggregation
In WSN, each SNs are deployed with an equal energy level. Energy of node is calculated as product of power and time which is calculated as below, In (1),E indicates energy of SNs. Due to sensing nature of node, the energy level gets degraded. Therefore, the residual energy of SN is estimated. The residual energy is remaining node energy after sensing the data. Residual energy is formalized as below, In (2),E_r indicates residual energy of sensor, E_total is total energy of node, E_consumed is the consumed energy of node. With the assist of ensemble of elbow method and fuzzy c means clustering algorithm, the SNs are grouped. In EEEEFCC-DA method, finding an optimum "c" number of clusters is performed by applying an Elbow method and then the residual energy based clustering is done by using the fuzzy c-means algorithm. The conventional clustering techniques randomly initialize the number of clusters resulting in causes an error after the clustering process. In order to overcome such kind of issue, the EEEEFCC-DA method introduces the elbow method for finding the optimal number of clusters before the grouping process. Flow process of elbow method based fuzzy c means clustering of SN is portrayed in figure 1. Numbers of SNs are disseminated in network. By applying elbow method, optimal "c"number of clusters is chosen for grouping the SNs. The elbow criterion lessens the summation of square error of clustering. When enhancing value of "c", the error gets decreased. Therefore, it evident that higher number of clusters, then the error gets minimized. In this case, the point at which the elbow starts is taken as optimal number of clusters. Let us consider value of cluster c=1 and the summation of the squared error are calculated as follows.
In (3),ω denotes a summation of the squared error, 〖SN〗_i denotes a sensor nodes in the cluster, C_j is the 'jth' cluster. Then by increasing the value of the cluster c=2 to c=3, then the three clusters are formed and again calculates the error. Similarly, the error is decreased while increasing the number of clusters. At a particular point, the elbow point is met and their resultant clusters are considered as an optimal value of clusters. Elbow criterion based optimal number of cluster selection is described in figure 2. The numbers of clusters are denoted as c=1,c=2,c=3,c=4,c=5. Red color circle shows that the Elbow point. The elbow point is chosen as an optimal number of clusters where the summation of the squared error is minimized.
As shown in figure 2, three optimal clusters (i.e. c=3) are chosen for grouping the SNs. Depends on their energy level, numbers of clusters are chosen for grouping the SNs. After selecting the clusters, the centroid is defined for each cluster. Fuzzy membership is computed based on the similarity measure.
In (4),β_ij represents the fuzzy membership function which is used to identify the member of that particular cluster, ρ_ij denotes a similarity among i^th SNs and j^th cluster center, ρ_ic represents the similarity between the "i^th SNs and c^th cluster centroid. Similarity among SNs and cluster centroid is calculated using Bray-Curtis Similarity Index.
In (5),ρ_ij denotes a Bray-Curtis Similarity coefficient, d_ij denotes a mutual independence between the SNs and cluster center, |〖SN〗_i| and |c_j| represents the cardinalities of the two sets (i.e. number of an elements in each set i.e. SNs and cluster centers). The Bray-Curtis similarity coefficient is bounded between 0 and 1, where 1 means the SN is said to be a favourable node to that cluster and 0 means the SN is said to be a non-favorable node to that cluster. The process gets repeated until all the SNs get clustered to the suitable clusters. For each iteration, the cluster centroid gets updated to group all the SNs.
In (6),c_j (t) denotes an updated cluster centroid of the clusters, τ denotes a fuzzifier determines the level of cluster fuzziness. β_ij denotes a fuzzy membership function. Based on the updated value of cluster center and residual energy level, the entire SNs are grouped into cluster. Figure 3 shows flow chart to partition the total wireless network into different groups using Bray-Curtis Similarity. Higher similarity among the sensor energy level and cluster centroid has high probability to group SNs into particular cluster. This process is iterated till convergence is met.
After clustering the nodes, the CH is chosen for data aggregation. Each group consists of one CH and cluster members. The SN with higher residual energy is selected as CH. CH gathers data from their members" nodes. For further processing, CH sends gathered to sink node.
Ensemble cluster-based data aggregation is shown in Figure 4. Based on their residual energy, SNs are grouped into various clusters. Each cluster includes one CH for collecting the data from the nodes within the group. This helps to improve the energy consumption in the data aggregation as well as enhancing the network lifetime. The algorithmic process of ensemble cluster-based data aggregation is described as follows.

Algorithm.1 Energy Efficient Ensembled Elbow Fuzzy C-means Clustering Based Data Aggregation
Algorithm 1 describes the energy efficient data gathering with higher accuracy. Initially, the residual energy of node is estimated to group the SNs. Elbow method is applied for finding the optimal number of clusters via lessen the summation of squared error. Then, the fuzzy c means clustering group the SNs with higher similarity. The centroid is assigned to the cluster with respect to the residual energy of node. Then the membership is calculated to discover member for particular cluster. The similarity among energy level of SN and cluster center is calculated and discover the favourable and non-favourable node to that cluster. Depends on similarity coefficient value, the favourable nodes are precisely grouped into cluster. CH is selected for coordinating SNs within the cluster. SN sends the sensed information to that CH. CH aggregates data from member and sends to sink node. At last, sink node receives information from CH. This assists to enhance the energy efficient data aggregation with minimum time.

Simulation setup and parameter settings
Simulation of EEEEFCC-DA method and existing [1] and [2] are implemented in NS2.34 network simulator. SNs are disseminated in a square area of A^2 (1100 m * 1100 m) with random waypoint mobility model. SNs are moved in the speed of 0 to 20m/sec. Simulation time is set as 300 sec. For energy efficient data aggregation, the Dynamic Source Routing (DSR) protocol is used in the simulation setup. Table 1 shows the simulation parameters and values.
The various performance metrics are evaluated with the above-said simulation parameters such as energy consumption, network lifetime, DAA and data aggregation time.

Numerical simulation result analysis
The results of EEEEFCC-DA method and existing MAEF [1] and SRS-CCDG [2] are discussed in this section in the form of numerical values with different parameters. The comparative analyses are done with assist of tables or graphs.

Energy consumption
Energy consumption is defined as the amount of energy taken by SN to aggregate DPs. The energy consumption formula is given below, In (7), E_con denotes energy consumption, 〖SN〗_n represents the number of sensor nodes (SN). Energy consumption is measured in joule (J).
To evaluate the energy consumption, the SNs are considered in the range of 50-500. When considering the 50 SNs to conduct the simulation, EEEEFCC-DA method achieves 28joules of energy consumption whereas the state-of-the-art works MAEF [1] and SRS-CCDG [2] consumes 31 joules and 35Joules. From the above discussion, it is evident that EEEEFCC-DA method consumes minimal energy than the existing works. \Simulation result of energy consumption is illustrated in figure 5 with number of SNs using three data aggregation methods. As shown in the graph, EEEEFCC-DA method consumes a lesser energy than the conventional methods. Since, the EEEEFCC-DA method calculates the energy of each SN for data aggregation. Then, the node residual energy is calculated to identify higher energy nodes. Depends on energy level, the node are grouped. CH is selected for data gathering and send to base station. Due to this process, energy consumption while performing the data aggregation is reduced. The ten various results of energy consumption with three different methods are compared. This helps to show that the EEEEFCC-DA method minimizes the amount of energy for data aggregation by 9% and 19% as compared to existing [1] and [2]

Network lifetime
Network lifetime measured as the ratio of number of energy efficient sensor nodes are selected for data aggregation to total number of SN. The formula for calculating the network lifetime is calculated as follows N LT = Number of energy efficient sensor nodes are selected n * 100 (8) In (8),N_LT denotes a network lifetime, n represents the number of SNs. It is measured in percentage (%).
The simulation result of network lifetime is calculated using EEEEFCC-DA method and existing MAEF [1] and SRS-CCDG [2] with the numbers of SNs in the range of 50 to 500. Consider 50 SNs for calculating the network lifetime. Among the 50 nodes, 44 nodes are selected to perform data aggregation. Network lifetime of EEEEFCC-DA method is 88% whereas network lifetime of [1] and [2] are 82% and 78% respectively.
Result of network lifetime is described in table 2 with number of SNs. From the table, it is confirm that network lifetime of EEEEFCC-DA method is increased than the conventional methods. This is achieved via the cluster based data aggregation. CH is chosen for each cluster to maintain the energy efficient nodes. Data aggregation is essential to save the SNs. Since, the lesser energy SNs are unable to perform more tasks at a longer duration. As a result, the cluster based data aggregation is an effective method to conserve resources in the cluster. Thus, prolong the network lifetime. The network lifetime of EEEEFCC-DA method is enhanced by 5% and 10% than the existing [1] and [2]. = * 100 (9)

Data aggregation accuracy
DAA is measured as ratio of number of DPs collected by sink node to total number of collected DPs sent. DAA is calculated as follows, In (9), ,DP denotes data packets, S_n represents the sink node. The data gathering accuracy is measured in percentage (%).Simulation results of DAA are illustrated in figure 6 with numbers of DPs in the range of 25 to 250. The above figure shows that the simulation results of the DAA with three different methods are represented in three different colours of lines. Among the three different data aggregation method, the EEEEFCC-DA has considerably improved the DAA. For finding optimal number of clusters in fuzzy c-means method, the elbow method is applied. After that, the Bray-Curtis Similarity Index is applied to discover similarity of cluster centroid and residual energy of SNs. Coefficient offers higher similarity, and SN is grouped into that cluster. The node with higher residual energy is selected as CH. CH collects the sensed data from SNs within the cluster and sent to sink node with less energy consumption. Thus, sink node gathers data from the CH with lesser loss. This assists to attain higher DAA. The results of EEEEFCC-DA method increases the DAA by 5% and 11% as compared to existing [1], [2].

Data aggregation time
Data aggregation time refers to amount of time consumed by sink node to gather DPs from CH. T DA is computed as follows, From equation (10), T_DA represents a data aggregation time, n represents the number of data packets sent, t denotes a time for collecting one DP, dp denotes data packets. The Data aggregation time is measured in milliseconds (ms). Table 3 describes the impact of data aggregation time with number of DPs using three methods. As shown in the above-reported results in table 3, EEEEFCC-DA method improves the energy efficient data aggregating with minimum time when compared to MAEF [1] and SRS-CCDG [2] respectively. This is owing to the application of a cluster based data aggregation on the contrary to existing works. The conventional aggregation method collects the data from each SN and leads to delay in data collection. By applying EEEEFCC-DA method, the sink node collects the data from i.e. CH instead of collecting all the nodes within the network. This in turn, minimizes the data aggregation time.In the simulation scenario, When considering 25 DPs the data aggregation time of EEEEFCC-DA method is 15ms whereas the data gathering time of MAEF [1] and SRS-CCDG [2] are 20ms and 25ms. From that, the results of EEEEFCC-DA method reduce the data aggregation time by 14% and 24% than the conventional works. From the above discussions, the EEEEFCC-DA method improves the energy efficient data aggregation with high accuracy and network lifetime with minimum energy consumption as well as time.

Conclusions
EEEEFCC-DA method is introduced for enhancing the data aggregation and network lifetime by minimizing energy consumption. Therefore, the CH based data collection method lengthening the network lifetime than the other methods. By using ensemble of elbow method and fuzzy c means clustering, the total network is divided into optimal number of clusters. Next, energy efficient data aggregation is performed in the sink node. Then, the sensed data from CH are aggregated by the sink node with minimal time. The simulation is carried out with the proposed EEEEFCC-DA method and two existing methods. The numerical simulation analysis concludes that the EEEEFCC-DA method improves the energy efficient data aggregation with minimal time as compared to conventional methods.