Big data for Security, Dependabilityand Intelligence «La rivoluzione dei dati e i sistemi informativi intelligenti. Big data e Linked Open per le decisioni nelle comunità Smart» Bologna October 24th 2014 Roberto Baldoni Cyber Intelligence and Information Security Research Center Università degli studi di Roma La Sapienza baldoni@dis.uniroma1.it
Sensitive economicsectorsto cyber threats Cyber space Efficiency Productivity in the close future the economic prosperity of a country will be measured according to the degree of security of its cyberspace 2
Local vs Nationwide approach We need nationwide actors to face this challenge with points of excellence spread over the Country 3
Cyber Security National Lab Network Laboratory(established May 2014, operative Sept 2014) under the umbrella of Consorzio Interuniversitario Nazionale Informatica (CINI) Structuring and orchestrating togheter Italian academic excellence in Cyber-Security (e.g., cryptography, dependability, information security, hardware security, malware analysis, CIP, risk management, intelligence etc.) Take advantage of presence of sites with specific research footprints and of spreading through Italy Fine mapping of research and education in Italy 4
Cyber Security National Lab 33 Local Labs Università "Ca' Foscari" di Venezia Politecnico di Torino Università degli Studi di Bari Università degli Studi di Pisa Università degli Studi di Padova Università degli Studi di Genova Università degli Studi di Udine Università degli Studi di Palermo Università degli Studi di Parma Università degli Studi di Milano Politecnico di Milano Università degli Studi di Pavia Università degli Studi di Bologna Università degli Studi di Torino Università degli Studi del Sannio Università della Calabria Università degli Studi di Firenze Università degli Studi di Cagliari Università degli Studi di Perugia Università degli Studi di Catania Università degli Studi di Salerno Università degli Studi di Trento Università degli Studi di Napoli "Federico II" Università degli Studi di Roma "La Sapienza" Università degli Studi di Roma Tor Vergata Università degli Studi di Napoli Partenophe Università degli Studi Mediterranea di Reggio Calabria Università degli Studi dell'insubria Varese-Como IMT Institute for Advanced Studies Lucca Università degli Studi di Modena e Reggio Emilia Seconda Università degli Studi di Napoli Università degli Studi di Milano-Bicocca 5
6
NATIONAL AWARENESS (white book) NATIONAL GIF WG EU WG Education&Skills& Certification Advices&frameworks Capabilities Expertise Persons at differentlocal sites
A fewprojectsatcis Sapienza on Big Data
11
Progressive Scanning of On-line Communities Fundamental source of information in business and information security intelligence Contain information that can be used for inferring trends and evolution about specific topics A social community, and what it publishes, often influences other communities, and vice-versa thus creating a network of causal relationships that can contain useful information about the evolution of a specific phenomenon 12
SemanticCausal Relationships U 1,1 U 1,2 U 1,3 U 1,4 U 1,5 U 1,6 C 1 U 2,1 U 2,2 U 2,3 U 2,4 U 2,5 U 2,6 t C 2 U 3,1 U 3,2 U 3,3 U 3,4 U 3,5 U 3,6 t C 3 semantic causal relationship community update t 13
14 Architecture Overview
Technologies in the Architecture
Data Mining Clustering Output Sample 16
18 European Election 2014
European Election 2014 Analyzed from 1/5/2014 to 30/5/2014 about 1.250.000 tweets with hashtag (e.g.): #80euro, #alfano, #ballaro, #berlusconi, #cambiareverso, #tuttiacasa, #vinciamonoi, #governorenzi, #grillo, #inmezzora, #jobsact, etc (circa 100) more than 100 twitter accounts (e.g.) @ale_moretti, @AlemannoTW, @AngeloTofalo, @beppe_grillo, @BerniniAM, @Capezzone, @demagistris, @Dsantanche etc. Political article comments of the following newspapers: Il Sole 24 ORE, Corriere della Sera, la Repubblica, Italia Oggi, il Fatto Quotidiano, La Stampa, Il Messaggero 19
Content Analysis Main ingredients of the software architetture Information clustering through sentiment analysis (italian language and sarcasm are big issues) HPC platform BIGDATA tools Preliminary reults Cluster size seems to be a good approximation of the real world 20
Content Analysis 8,0% 6,0% PD M5S FI 4,9% 5,5% Scostamento da risultati (%) 4,0% 2,0% 0,0% -2,0% -4,0% -6,0% -8,0% -5,0% 0,9% -2,8% -7,7% 2,2% -7,0% 1,0% -6,4% 3,5% 3,0% -6,1% 2,8% 2,6% -10,0% Dataset IPR Agorà EMG IPSOS 21
22 Content Analysis 60 50 Cluster0 - M5S Cluster2 - FI Cluster3 - PD Cluster4 - Porta a Porta 40 30 20 10 0 19/05 20/05 21/05 22/05 23/05 24/05 25/05 26/05 27/05 28/05 29/05 30/05 Dimensione cluster (%)
23 Predictingfaultsin datacenters
Predictingfaultsin datacenters Ministry Economic and Finance datacenter Non intrusive and black box approach Predicting failures of a software/hardware component of the datacenter using Network traffic Power consumption 24
DetectingDeviations Deviations from system normal behavior are often symptoms of an imminent or an occurring failure. An inferential engine can learn the system normal behavior (i.e., when the system is fault-free) in terms of a set of monitored metrics. Once trained on sufficient data it can be used to predict the value of the monitored metrics. An alert is raised whenever the prediction error exceeds a given threshold 25
26 Architecture
Network StatisticsCalculatorand ANN InferentialEngine Produces a stream of network statistics on the mergedstream of packets E.g., packet rate, bandwith, average message size, etc. Stream processing Two Elman Recurrent Neural Networks, both trained with the network and power consumption traces 27
RNN1 Experiment (predictingpower Consumptionfrom Network traffic) 28
29
30 Conclusioni