IEEE/IFIP International Workshop on Analytics for Network and Service Management

AnNet 2016

April 25, 2016 in Istanbul, Turkey

IEEE/IFIP Network Operations and Management Symposium
                   Istanbul Turkey 25-29 APRIL 2016

11:00 - 12:30 Technical Session 1
  Price / Cooling Aware and Delay Sensitive Scheduling in Geographically Distributed Data Centers
Abstract: Servers in data centers consume large amount of energy which increase the operational cost for cloud service providers, that spend a major portion of their revenue to pay bills due to inefficient workload assignment and wastage of resources. In order to mini- mize the operational cost of data centers, it is essential to optimize the scheduling of the jobs. In this paper, we address the problem of inefficient cooling system, SLA violations due to network delays and processing delays in geographically distributed data centers. We propose scheduling algorithms that aim to minimize the cooling cost by exploiting the temperature varia- tions within the data centers and electricity cost by taking advantage of time-space-varying fluctuation of electricity prices. SLA violations are aimed to be mini- mized by assigning jobs considering deadlines, network delays and queuing delays. Experiments conducted on CloudSim show that price/cooling aware and delay sensitive scheduling reduces the overall cost by 22% as compared to random scheduling.
WAN Capacity Forecasting for Large Enterprises
  Abstract: Large enterprises require reliable and scalable network connectivity which relies heavily on correct network design for LAN and ample bandwidth on WAN. The latter is mostly affected by external market-defined prices, which, absent careful optimization and estimation, can result in unnecessary business expenses. This paper presents a framework for network capacity forecast of a large enterprise to enable accurate and reliable prediction of WAN requirements for all enterprise offices. Quarterly forecasts are generated for individual offices in an enterprise network using historical bandwidth utilization for each office and their associated usage headcount. This framework is currently used to inform WAN circuit upgrade/downgrade decisions for more than 70 offices, and more than 200 associated circuits. The framework uses statistical regression models to create 6, 12, and 24 months forecast for each office, and rigorously evaluates the forecast accuracy with real data going back to 2014Q1. This office-centric approach makes the framework applicable to any corporate network or any large/distributed network-dependent organization.
Dynamic Placement of Virtual Network Functions based on Model Predictive Control
  Abstract: Dynamic placement of the virtual network functions (VNFs) is one of the promising approaches to handling timevarying demands; when demands are small, the energy consumption can be reduced by placing the VNFs to a small number of physical nodes and shutting down unused nodes. If the demands becomes large, the VNFs are migrated to allocate the sufficient resources. In the dynamic placement of the VNFs, it is important to avoid a large number of migrations at each time because the migration requires a large amount of bandwidth. In this paper, we propose a new method to dynamically place the VNFs to follow the traffic variation without migrating a large number of VNFs. Our method is based on the model predictive control (MPC). By applying the MPC to the dynamic placement of the VNFs, our method starts migration in advance by considering the predicted future demands. As a result, our method allocates sufficient resources to the VNFs without migrating a large number of VNFs at the same time even when traffic variation occurs. Through simulation, we demonstrate that our method handles the time variation of the demands without requiring a large number of migration at any time slot.
Collating time-series resource data for system-wide job profiling
  Abstract: Through the collection and association of discrete time-series resource metrics and workloads, we can both provide benchmark and intra-job resource collations, along with systemwide job profiling. Traditional RDBMSes are not designed to store and process long-term discrete time-series metrics and the commonly used resolution-reducing round robin databases (RRDB), make poor long-term sources of data for workload analytics. We implemented a system that employs ”Big-data” (Hadoop/HBase) and other analytics (R) techniques and tools to store, process, and characterize HPC workloads. Using this system we have collected and processed over a 30 billion timeseries metrics from existing short-term high-resolution (15-sec RRDB) sources, profiling over 200 thousand jobs across a wide spectrum of workloads. The system is currently in use at the University of Kentucky for better understanding of individual jobs and system-wide profiling as well as a strategic source of data for resource allocation and future acquisitions.
14:00 - 15:10 Technical Session 2
Towards an Approximate Graph Entropy Measure for Identifying Incidents in Network Event Data
  Abstract: A key objective of monitoring networks is to identify potential service threatening outages from events within the network before service is interrupted. Identifying causal events, Root Cause Analysis (RCA), is an active area of research, but current approaches are vulnerable to scaling issues with high event rates. Elimination of noisy events that are not causal is key to ensuring the scalability of RCA. In this paper, we introduce vertex-level measures inspired by Graph Entropy and propose their suitability as a categorization metric to identify nodes that are a priori of more interest as a source of events. We consider a class of measures based on Structural, Chromatic and Von Neumann Entropy. These measures require NPHard calculations over the whole graph, an approach which obviously does not scale for large dynamic graphs that characterise modern networks. In this work we identify and justify a local measure of vertex graph entropy, which behaves in a similar fashion to global measures of entropy when summed across the whole graph. We show that such measures are correlated with nodes that generate incidents across a network from a real data set.
A New Approach for Clustering Alarm Sequences in Mobile Operators
  Abstract: Telecom Networks produce huge amount of daily alarm logs. These alarms usually arrive from different regions and network equipments of mobile operators at different times. In a typical network operator, Network Operations Centers (NOCs) constantly monitor those alarms in a central location and try to fix issues raised by intelligent warning systems by performing a trouble ticketing based management system. In order to automate rule findings, different sequential rule mining algorithms can be exploited. However, the number of sequential rules and alarm correlations that can be generated by using these algorithms can overwhelm the NOC administrators since some of those rules are neither utilized nor reduced appropriately by the non-customized sequential rule mining algorithms. Therefore, additional efficient and intelligent rule identification techniques need to be developed depending on the characteristic of the data. In this paper, two new metrics that is inspired from document classification approaches are proposed in order to increase the accuracy of the sequential alarm rules. This approach utilizes new definition of identifying transactions as alarm features and clustering the alarms by their occurrences in built transactions. Experimental evaluations demonstrate that up to 61% accuracy improvements can be achieved through utilizing the proposed appropriate metrics compared to a sequential rule mining algorithm.
nDEWS: a new domains early warning system for TLDs
  Abstract: We present nDEWS, a Hadoop-based automatic early warning system of malicious domains for domain name registry operators, such as top-level domain (TLD) registries. By monitoring an entire DNS zone, nDEWS is able to single out newly added suspicious domains by analyzing both domain registration and global DNS lookup patterns of a TLD. nDEWS is capable to detect several types of domain abuse, such as malware, phishing, and allegedly fraudulent web shops. To act on this data, we have established a pilot study with two major .nl registrars, and provide them with daily feeds of their respective suspicious domains. Moreover, nDEWS can also be implemented by other TLD operators/registries.
16:00 - 17:10 Technical Session 3
Detection of Vulnerability Scanning Using Features of Collective Accesses Based on Information Collected from Multiple Honeypots
  Abstract: Attacks against websites are increasing rapidly with the expansion of web services. An increasing number of diversified web services make it difficult to prevent such attacks due to many known vulnerabilities in websites. To overcome this problem, it is necessary to collect the most recent attacks using decoy web honeypots and to implement countermeasures against malicious threats. Web honeypots collect not only malicious accesses by attackers but also benign accesses such as those by web search crawlers. Thus, it is essential to develop a means of automatically identifying malicious accesses from mixed collected data including both malicious and benign accesses. Specifically, detecting vulnerability scanning, which is a preliminary process, is important for preventing attacks. In this study, we focused on classification of accesses for web crawling and vulnerability scanning since these accesses are too similar to be identified. We propose a feature vector including features of collective accesses, e.g., intervals of request arrivals and the dispersion of source port numbers, obtained with multiple honeypots deployed in different networks for classification. Through evaluation using data collected from 37 honeypots in a real network, we show that features of collective accesses are advantageous for vulnerability scanning and crawler classification.
Optimizing ATM Cash Flow Network Management
  Abstract: Automated Teller Machine (ATM) service providers are increasingly challenged with improving the quality of customer service while reducing the cost of cash flow management. Effectively balancing the need to have enough cash in the ATMs to avoid out-of-cash incidents as well as to reduce the cash interest cost and the cash refill cost challenges the most experienced cash flow management teams. In this paper we propose an optimization framework for managing the ATM cash flow network. The interactions among various constraints and cost factors are included in the framework to allow decisionmaking regarding the optimal cash refill amount and schedule. We demonstrate the effectiveness of the proposed approach using sample data from a large commercial bank.
How to choose from Different Botnet Detection Systems?
  Abstract: Given that botnets represent one of the most aggressive threats against cybersecurity, various detection approaches have been studied. However, whichever approach is used, the evolving nature of botnets and the required pre-defined botnet detection rule sets employed may affect the performance of detection systems. In this work, we explore the effectiveness two rule based systems and two machine learning (ML) based techniques with different feature extraction methods (packet payload based and traffic flow based). The performance of these detection systems range from 0% to 100% on thirteen public botnet data sets (i.e. CTU-13). We further analyze the performances of these systems in order to understand which type of a detection system is more effective for which type of an application.