Measurement Science for Complex Information Systems

What are complex systems?
What is the problem?
What is the new idea?
What are the technical objectives?
Why is this hard?
Who would care?
Hard Issues & Plausible Approaches
     Spatiotemporal Scale
    Model Validation
     Tractable Analysis
     Causal Analysis
     Controlling Behavior
Publications
Software Tools
Presentations
Demonstrations

What are complex systems?

Large collections of interconnected components whose interactions lead to macroscopic behaviors:

Biological systems (e.g., slime molds, ant colonies, embryos)
Physical systems (e.g., earthquakes, avalanches, forest fires)
Social systems (e.g., transportation networks, cities, economies)
Information systems (e.g., Internet and Web services)

What is the problem?

No one understands how to measure, predict or control macroscopic behavior in complex information systems

threatening our nation’s security
costing billions of dollars

“[Despite] society’s profound dependence on networks, fundamental knowledge about them is primitive. [G]lobal communication … networks have quite advanced technological implementations but their behavior under stress still cannot be predicted reliably.… There is no science today that offers the fundamental knowledge necessary to design large complex networks [so] that their behaviors can be predicted prior to building them.”
quote from Network Science 2006, a report from the National Research Council

What is the new idea?

Leverage models and mathematics from the physical sciences to define a systematic method to measure, understand, predict and control macroscopic behavior in the Internet and distributed software systems built on the Internet

What are the technical objectives?

Establish models and analysis methods that (1) are computationally tractable, (2) reveal macroscopic behavior and (3) establish causality. Characterize distributed control techniques, including: (1) economic mechanisms to elicit desired behaviors and (2) biological mechanisms to organize components

Why is this hard?

Valid computationally tractable models that exhibit macroscopic behavior and reveal causality are difficult to devise. Phase-transitions are difficult to predict and control.

Who would care?

All designers and users of networks and distributed systems with a 25-year history of unexpected failures:

ARPAnet congestion collapse of 1980
Internet congestion collapse of Oct 1986
Cascading failure of AT&T long-distance network in Jan 1990
Collapse of AT&T frame-relay network in April 1998 …

Businesses and customers who rely on today's information systems:

“Cost of eBay's 22-Hour Outage Put At $2 Million”, Ecommerce, Jun 1999
“Last Week’s Internet Outages Cost $1.2 Billion”, Dave Murphy, Yankee Group, Feb 2000
“…the Internet "basically collapsed" Monday”, Samuel Kessler, Symantec, Oct 2003
“Network crashes…cost medium-sized businesses a full 1% of annual revenues”, Technology News, Mar 2006
“costs to the U.S. economy…range…from $65.6 M for a 10-day [Internet] outage at an automobile parts plant to $404.76 M for … failure …at an oil refinery”, Dartmouth study, Jun 2006

Designers and users of tomorrow's information systems that will adopt dynamic adaptation as a design principle:

DoD to spend $13 B over the next 5 yrs on Net-Centric Enterprise Services initiative, Government Computer News, 2005
Market derived from Web services to reach $34 billion by 2010, IDC
Grid computing market to exceed $12 billion in revenue by 2007, IDC
Market for wireless sensor networks to reach $5.3 billion in 2010, ONWorld
Revenue in mobile networks market will grow to $28 billion in 2011, Global Information, Inc.
Market for service robots to reach $24 billion by 2010, International Federation of Robotics

Hard Issues & Plausible Approaches

Model scale – Systems of interest (e.g., Internet and compute grids) extend over large spatiotemporal extent, have global reach, consist of millions of components, and interact through many adaptive mechanisms over various timescales. Which computational models can achieve sufficient spatiotemporal scaling properties? Micro-scale models are not computable at large spatiotemporal scale. Macro-scale models are computable and might exhibit global behavior, but can they reveal causality? Meso-scale models might exhibit global behavior and reveal causality, but are they computable? One plausible approach is to investigate abstract models from the physical sciences. e.g., fluid flows (from hydrodynamics), lattice automata (from gas chemistry), Boolean networks (from biology) and agent automata (from geography). We can apply parallel computing to scale to millions of components and days of simulated time.

Model validation – Scalable models from the physical sciences ( e.g., differential equations, cellular automata, nk-Boolean nets) tend to be highly abstract. Can sufficient fidelity be obtained to convince domain experts of the value of insights gained from such abstract models? We can conduct key comparisons along three complementary paths: (1) comparing model data against existing traffic and analysis, (2) comparing results from subsets of macro/meso-scale models against micro-scale models and (3) comparing simulations of distributed control regimes against results from implementations in test facilities, such as the Global Environment for Network Innovations.

Tractable analysis – The scale of potential measurement data is expected to be very large – O(10¹⁵) – with millions of elements, tens of variables, and millions of seconds of simulated time. How can measurement data be analyzed tractably? We could use homogeneous models, which allow one (or a few) elements to be sampled as representative of all. This reduces data volume to 10⁶ – 10⁷, which is amenable to statistical analyses (e.g., power-spectral density, wavelets, entropy, Kolmogorov complexity) to visualization.

Causal analysis – Tractable analysis strategies yield coarse data with limited granularity of timescales, variables and spatial extents. Coarseness may reveal macroscopic behavior that is not explainable from the data. For example, an unexpected collapse in the probability density function of job completion times in a computing grid was unexplainable without more detailed data and analysis. Multidimensional analysis can represent system state as a multidimensional space and depict system dynamics through various projections (e.g., slicing, aggregation, scaling). State-space dynamics can segment system dynamics into an attractor-basin field and then monitor trajectories.

Controlling Behavior – Large distributed systems and networks cannot be subjected to centralized control regimes because the system consists of too many elements, too many parameters, too much change, and too many policies Can models and analysis methods be used to determine how well decentralized control regimes stimulate desirable system-wide behaviors? Use price feedback (e.g., auctions, present-value analysis or commodity markets) to modulate supply and demand for resources or services. Use biological processes to differentiate function based on environmental feedback, e.g., morphogen gradients, chemotaxis, local and lateral inhibition, polarity inversion, quorum sensing, energy exchange and reinforcement.

Related Publications

V. Marbukh and S. Klink, "Decentralized Control of Large-Scale Networks as a Game with Local Interactions: Cross-Layer TCP/IP Optimization", 2nd International Conference on Performance Evaluation Methodologies and Tools, Nantes, France, October 23-25, 2007.
V. Marbukh, "Utility Maximization for Resolving Throughput/Reliability Trade-offs in an Unreliable Network with Multipath Routing", 2nd International Conference on Performance Evaluation Methodologies and Tools, Nantes, France, October 23-25, 2007.
V. Marbukh and K. Mills, "On Maximizing Provider Revenue in Market-Based Compute Grids", Proceedings of the 3rd International Conference on Networking and Services, Athens, Greece, June 19-25, 2007.
K. Mills, "A Brief Survey of Self-Organization in Wireless Sensor Networks", Wireless Communications and Mobile Computing, Wiley Interscience, Vol. 7, No. 7, October 2007, in press.
K. Mills and C. Dabrowski, "Investigating Global Behavior in Computing Grids", Self-Organizing Systems, Lecture Notes in Computer Science, Volume 4124 ISBN 978-3-540-37658-3, pp. 120-136.
K. Sriram, D. Montgomery, O. Borchert, O. Kim and D. R. Kuhn, "Study of BGP Peering Session Attacks and Their Impacts on Routing Performance", IEEE Journal on Selected Areas in Communications, Volume 24, No. 10, October 2006, pp. 1901-1915.
J. Yuan and K. Mills, "Simulating Timescale Dynamics of Network Traffic Using Homogeneous Modeling", The NIST Journal of Research, Volume 111, No. 3, May-June 2006, pp. 227-242.
J. Yuan and K. Mills, "Monitoring the Macroscopic Effect of DDoS Flooding Attacks", IEEE Transactions on Dependable and Secure Computing, Volume 2, No. 4, October-December 2005, pp. 324-335.
J. Yuan and K. Mills, "A Cross-Correlation Based Method for Spatial-Temporal Traffic Analysis", Performance Evaluation, Volume 61/2-3, pp 163-180.
J. Yuan and K. Mills, "Macroscopic Dynamics in Large-Scale Data Networks", chapter 8 in Complex Dynamics in Communication Networks, edited by Ljupco Kocarev and Gabor Vattay, published by Springer, 2005, ISBN 3-540-24305-4, pp. 191-212.
J. Yuan and K. Mills, "Exploring Collective Dynamics in Communication Networks", The NIST Journal of Research, Volume 107, No. 2, March-April 2002, pp. 179-191.
J. Heidemann, K. Mills and S. Kumar, "Expanding Confidence in Network Simulation", IEEE Network Magazine, Vol. 15, No. 5, September/October 2001, pp. 58-63.

Related Software Tools

SLX software for simulated computing grid used in "Investigating Global Behavior in Computing Grids".
(see http://www.wolverinesoftware.com/ for information on the SLX simulation environment)
Matlab MFiles used in "Simulating Timescale Dynamics of Network Traffic Using Homogeneous Modeling".
(see http://www.mathworks.com/ for information on Matlab)
Matlab MFiles used in "Monitoring the Macroscopic Effect of DDoS Flooding Attacks".
Matlab MFiles used in "A Cross-Correlation Based Method for Spatial-Temporal Traffic Analysis".
Matlab MFiles used in "Macroscopic Dynamics in Large-Scale Data Networks".
Matlab MFiles used in "Exploring Collective Dynamics in Communication Networks".
MesoNet: a Medium-scale Simulation Model of a Router-Level Internet-like Network
EconoGrid: a detailed Simulation Model of a Standards-based Grid Compute Economy
Flexi-Cluster: a Simulator for a Single Compute Cluster
MesoNetHS: A Medium-scale Network Simulation with TCP Congestion-Control Algorithms for High Speed Networks, including Compound TCP, FAST, H- TCP, HS-TCP and Scalable TCP

Related Demonstrations

Visualization (10 Mbyte .avi) from a Simulation (May 23, 2007) of an Abilene-style Network
Visualization (14.4 Mbyte .avi) from a Simulation (July 31, 2007) of a Network Running CTCP


www.antd.nist.gov
Web site owner: The National Institute of Standards and Technology
	Disclaimer Notice & Privacy Policy / Security Notice Send comments or suggestions to webmaster@antd.nist.gov The National Institute of Standards and Technology is an Agency of the U.S. Commerce Department's Technology Administration Created, maintained and owned by: ANTD's webmaster Last updated: May, 2006 Date Created: May, 2001