Three Questions with Mark Körner

Three questions with a new staff member!  

Marc Körner as the most recent addition to the ESnet Software Engineering – Orchestration and Core Data team (OCD).  He comes to us from Join Digital in San Jose where he was Lead Engineer in their Network Services team.  

Marc has a PhD in Computer Science from the Technical University of Berlin, and has spent a number of years working as a researcher in both the Berlin and Berkeley areas. He’ll be working on the automation side of OCD getting familiar with the network services orchestrator platform and helping us achieve our ESnet6 deliverables. Marc is onboarding virtually this week but resides full time in the San Jose area.

What brought you to ESnet?

I was always very passionate about computer networks. The idea of having a global technology for the data and knowledge exchange was always very fascinating for me. It started with the LAN sessions I had with my friends and ended up with the tremendous opportunity to build the first SDN research network in Europe. After my time as a research fellow at the UC Berkeley Netsys lab group and my startup experiences in the access network provider business, the open position for the network automation at ESnet was the ultimate opportunity to take it to the next level.

What is the most exciting thing going on in your field right now?

This question is not easy to answer, there are so many things going on in computer networks. I think one of the biggest innovations in the last decade is the virtualization in general and the centralization of network management and control. However, one of the more recent trends which correlates with this particular network development is edge computing, or the slightly more generalized concept of fog computing and its seamless orchestration. It’s basically a fine granular fusion of the compute and network control plan, which we also observed in cloud computing.

What book would you recommend?

It has been a while since I read a book. As an EECS guy people are probably expecting something to hear like: “The programming language C by Brian Kernighan and Dennis Ritchie”. However, if you are interested in science in general I would probably recommend: “The Universe in a Nutshell by Stephen Hawking”. The book provides some interesting insights about modern physics and has the potential to open up interesting views on the world around us.

40G Data Transfer Node (DTN) now Available for User Testing!

ESnet’s first 40 Gb/s public data transfer node (DTN) has been deployed and is now available for community testing. This new DTN is the first of a new generation of publicly available networking test units, provided by ESnet to the global research and engineering network community as part of promoting high-speed scientific data mobility. This 40G DTN will provide four times the speed of previous-generation DTN test units, as well as the opportunity to test a variety of network transfer tools and calibrated data sets.

The 40G DTN server, located at ESnet’s El Paso location, is based on an updated reference implementation of our Science DMZ architecture. This new DTN (and others that will soon follow in other locations) will allow our collaborators throughout the global research and engineering network community to test high speed, large, demanding data transfers as part of improving their own network performance. The deployment provides a resource enabling the global science community to reach levels of data networking performance first demonstrated in 2017 as part of the ESnet Petascale DTN project

The El Paso 40G DTN has Globus installed for gridFTP and parallel file transfer testing. Additional data transfer applications may be installed in the future. To facilitate user evaluation of their own network capabilities ESnet Data Mobility Exhibition (DME), test data sets will be loaded on this new 40G DTN shortly. 

All ESnet DTN public servers can be found at https://app.globus.org/file-manager. ESnet will continue to support existing 10G DTNs located at Sunnyvale, Starlight, New York, and CERN. 

ESnet's 40G DTN Reference Architecture Block Diagram
ESnet’s 40G DTN Reference Architecture Block Diagram

The full 40G DTN Reference architecture and more information on the design of these new DTN can be found here:

A second 40G DTN will be available in the next few weeks, and will be deployed in Boston. It will feature Google’s bottleneck bandwidth and round-trip propagation time (BBR2) software, allowing improved round-trip-time measurement and the ability for users to explore BBR2 enhancements to standard TCP congestion control algorithms.

In an upcoming blog post, I will describe the Boston/BBR2-enabled 40G DTN and perfSONAR servers. In the meantime, ESnet and the deployment team hope that the new El Paso DTN will be of great use to the global research community!  

Three Questions with Jay Stewart

Three questions with a new staff member!

Jay was born in Cambridge (UK) but moved, when he was four, to the United States. Jay grew up in Brookhaven National Laboratory’s backyard, Long Island, and attended school at Suffolk University in Boston where he received a degree (BS) in Marketing with a minor in Information Systems. Jay became keenly interested in computers when a parent’s colleague gifted him a CD-ROM with a slew of MS-DOS-based games. It was through the immortal wisdom of the game “Ecco the Dolphin” that his wrists became transfixed to the computer desk and they’ve been there ever since. In his spare time, he enjoys reading and learning to live without sleep as he and his wife had their first child in October

What brought you to ESnet?

I came to ESnet as a Network Engineer who had been working for a commercial ISP, Pilot Fiber, based out of Manhattan for the last 3 years. I had joined the ranks of the Service-Provider lifestyle right out of college as a call center technician and googled every term to climb the ranks to Network Engineer. Nick Buraglio had reached out about the position at ESnet. Knowing that I will be following in my father’s footsteps by helping to ensure the highest degree of scientific collaboration made the decision, to join ESnet, an easy one. I’ll be working onsite at Brookhaven National Laboratory whilst helping to ramp up their connections onto ESnet6 and transitioning into my role as site ambassador.

What is the most exciting thing going on in your field right now?

At the highest level, I think Quantum communication is an exciting thing to read about and to try and understand. Seeing that Quantum computing, in general, has the power to move us from the law of Moore’s to the law of Neven’s is thrilling. A more grounded excitement, at least to me, is the work being done with Segment Routing. Think of it like Waze for your packets. It allows granular steering capabilities alongside ensuring a bird’s eye view of your network. The instructions are encapsulated in the header of the packet and get directed/removed at each node in the path in a matryoshka-doll-esque fashion.

What book would you recommend?

I’m a sucker for a futuristic, dystopian book and The Water Knife by Paolo Bacigalupi fits that need nicely.

Re-imagining perfSONAR to gain new network insights

Scientific discovery increasingly relies on the ability to perform large data transfers across networks operated by many different providers (including ESnet) around the globe. But what happens when a researcher initiates one of these large data transfers and data movement is slow? What does “slow” even mean? These can be surprisingly complex questions and it is important to have the right tools to help answer them. perfSONAR is an open source software tool designed to measure network performance and pinpoint issues that occur as data travels across many different networks on the way to a destination.

perfSONAR has been around for more than 15 years and is primarily maintained today by a collaboration of ESnet, GEANT, Indiana University, Internet2, and the University of Michigan. perfSONAR has an active community that extends well beyond the five core organizations that maintain the software with more than 2000 public deployments that span six continents and hundreds of organizations. perfSONAR deployments are capable of scheduling  and running tests that calculate metrics including (but not limited to) how fast a transfer can be performed (throughput), if a unit of information makes it to a desired destination (packet loss), if so how long did it take (latency) and what path did it take to get there (traceroute). What is novel about perfSONAR is not just these metrics, but the set of tools to feature these metrics in dashboards built by multiple collaborating organizations.  These dashboards aim to clearly identify patterns that signify potential issues and provide the means to drill-down into graphs that give more information.

Example perfSONAR dashboard grid highlighting packet loss to an ANL test node (top). Example line graphs that further illustrate aspects of the problem (bottom).

While perfSONAR has had great success in providing the current set of capabilities, there is more that can be done. For example, perfSONAR is very good at correlating metrics it collects with the other perfSONAR metrics with at least one similar endpoint. But what if we want to correlate the metrics by location, intermediate network or with non-perfSONAR collected statistics like flow statistics and interface counters? These are all key questions the perfSONAR project is looking to answer. 

Building upon a strong foundation

PerfSONAR has the ability to add analytics from other software tools using a plug-in framework. Recently, we have begun to use Elastic Search via this framework, to ingest log data and enable improved search and analytics on perfSONAR data.

For example, traditionally perfSONAR has viewed an individual measurement as something between a pair of IP addresses. But what do these IP addresses represent and where are they located? Using off-the-shelf tools Elastic Search in combination with Logstash, perfSONAR is able to answer questions like “What geographic areas are showing the most packet loss?”.

Example map showing packet loss hotspots to different locations around the globe. It also contains a menu to filter results by intermediate network.

Additionally, we can apply this same principle to traceroute (and similar tools) that yield a list of IP addresses giving an idea of the path a measurement takes between source and destination. Each IP address is a key to more information about the path including not only geographic information but also the organization at each point. This means you can ask questions such as “What is the throughput of all results that transit a given organization?”. Previously a user would not only have to know the exact address of the IPs, but it would have to be the first (source) or last (destination) address in the path. 

Integration with non-perfSONAR data is another area the project is looking to expand. By putting perfSONAR data in a well established data store like Elasticsearch, the door is open to leverage other off-the-shelf open source tools like Grafana to display results. What’s interesting about this platform is not only its ability to build new visualizations, but also the diverse set of backends it is capable of querying. If data such as host metrics, network interface counters and flow statistics are kept in any of the supported data stores, then there is a means to present this information along perfSONAR data. 

Example of perfSONAR statistics combined with host statistics from a completely different database being displayed in Grafana

These efforts are very much still in their early stages of development, but initial indicators are promising. Leveraging the perfSONAR architecture in conjunction with the wealth of off-the-shelf open source tools available on the market today create opportunities to gain new insights from the network, like those described above, not previously possible with the traditional perfSONAR tools. 

Getting involved and learning more

The perfSONAR project will continue to provide updates as this work progresses. You can also see the perfSONAR web site for updates and more information on keeping in touch through our mailing lists. The perfSONAR project looks forward to working with the community to provide exciting new network measurement capabilities.

Three questions with Derek Howard

Three questions with a new ESnet staff member!  

Derek Howard is a software developer from Columbia, MO. Prior to joining ESnet, Derek worked as an HPC system administrator for the University of Missouri. Derek also created Augur (https://github.com/chaoss/augur) which is part of the Linux Foundation’s CHAOSS group (https://chaoss.community/), a working group focused on measuring the health and sustainability of open source software. 


Derek is part of the Network Services Automation group under John MacAuley, where he will be working primarily on our internal ESnet Database (ESDB).

Question 1: What brought you to ESnet?

I worked with George Robb at the University of Missouri and he joined ESnet a while ago and it seemed like a great place to work. I asked him if there were any positions at ESnet he thought might be a good fit for me and he referred me to the position I am in now. I’m really happy I joined; it is as great as I expected!

Question 2: What is the most exciting thing going on in your field right now?

With so much work underway for ESnet6, exciting changes are happening every day. We are pushing to get features out for all of our software as fast as possible right now. Right now, I am working on a feature in ESDB to make it easier for network engineers to verify hardware was installed correctly during router installs. 

As far as the broader field goes, I am excited about DDR5 memory becoming commercially available soon. 

Question 3: What book would you recommend?

Randall Munroe’s “What If?” – It’s a wonderful collection of serious answers to silly questions by the creator of XKCD.

Zeek and stream asymmetry research at ESnet

In my previous post, we discussed use of the open-source Zeek software to support network security monitoring at ESnet.  In this post, I’ll talk a little about work underway to improve Zeek’s ability to support network traffic monitoring when faced with stream asymmetry.

This comes from recent work by two of my colleagues on the ESnet Security team.

Scott Campbell and Sam Oehlert presented ‘Running Zeek on the WAN: Experiences and solutions for large scale flow asymmetry’ during a workshop held last year at CERN Geneva that explained the phases and deployment of the Zeek-on-the-WAN (ZoW) pilot in detail.

Scott Campbell at CERN presenting ‘Running Zeek on the WAN’
The asymmetry problem on a WAN (example)

Some of the significant findings and results from this presentation are highlighted below:

  • Phase I: Initial Zeek Node Design Considerations 
    • Select locations that provide an interesting network vantage point – in the case of our ESnet network, we deployed Zeek nodes on our commodity internet peerings (eqx-sj, eqx-chi, eqx-ash) since they represent the interface to the vast majority of hostile traffic.
    • Identifying easy traffic to test with and using spanning ports to forward traffic destined to the stub network on each of the routers used for collection.
  • Phase I: Initial Lessons learned from testing and results
    • Some misconfigurations were found in the ACL prefix lists. 
    • We increased visibility into our WAN side traffic through implementation of new background methods.
    • Establishing a new process for end-to-end testing, installing and verifying Zeek system reporting. 
  • Phase II:  Prove there is more useful data to be seen
    • For phase II we moved towards collection of full peer connection records, from statistical sampling based techniques. Started running Zeek on traffic crossing the interfaces which connect ESnet network peers to the internet from the AS (Autonomous system) responsible for most notices. .
    • To get high fidelity connection information without being crushed by data volume, define a subset of packets that are interesting – zero length control packets (Syn/Syn-Ack/Fin/Rst) from peerings.
  • Phase II: Results
    • A lot of interesting activity got discovered like information leakage in syslogs, logins (and attempted logins) using poorly secure authentication protocols, and analysis of the amount of asymmetric traffic patterns gave valuable insights to understand better the asymmetric traffic problems.
  • Ongoing Phase III: Expanding the reach of traffic collection on WAN
    • We are currently in the process of deploying Zeek nodes at another three WAN locations for monitoring commodity internet peering – PNWG (peering at Seattle WA), AM-SIX (peering at Amsterdam) and LOND (peering at London)
Locations for the ZoW systems, the pink shows ongoing Phase III deployment

As our use of Zeek on the WAN side of ESnet continues to grow, the next phase to the ZoW pilot is currently being defined.  We’re working to incorporate these lessons learned on how to handle traffic asymmetry into these next phases of effort. 

Some (not all) solutions being taken into consideration include: 

  • Aggregating traffic streams at a central location to make sense out of the asymmetric packet streams and then run Zeek on the aggregated traffic, or
  • Running Zeek on the individual asymmetric streams and then aggregating these Zeek streams @ 5-tuple which will be aggregation of connection metadata rather than the connection stream itself. 

We are currently exploring these WAN solutions as part of providing better solutions to both ESnet, and connected sites.

Three Questions with Chris Cummings

Three questions with a new ESnet staff member!  

Chris has joined ESnet as a network engineer and is supporting the ESnet6 project and day-to-day operations. He is a Network Engineer based out of Chicago, IL with many years of on-the-job experience, designing, deploying, and managing networks.

Chris started his networking career working on wireless broadband internet at a Wireless Internet Service Provider (WISP) in Juneau, Alaska. He has worked in various network engineering roles including for heavy industry, and underground mining.

When not doing networking, you likely won’t be able to find Chris as he will be out in the woods camping and entirely off the grid.

Question 1: What brought you to ESnet?

I wanted to work at ESnet because I knew it would bring me an entirely different set of challenges than what I was used to and would place me in an environment with incredibly intelligent people who could help me sort through those challenges.

Question 2: What is the most exciting thing going on in your field right now?

I’d have to say that it is the explosion of resources and focus on network automation. Networking has typically lagged behind other IT disciplines in this regard, so I think it’s very exciting to see networking catch up and start to benefit from combining software development methodologies with traditional network engineering.

Question 3: What’s a book you recommend?

For networking specifically, I would highly recommend Computer Networking Problems and Solutions by Ethan Banks and Russ White. What I like about this book is that it closely follows the advice given in RFC 1925 rule 11, which states that “Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.” This high-level approach teaches you to think about networking problems in a more abstract manner so that when you are approached with a new problem you can apply a common framework to the solution rather than having to reinvent the wheel every time.

For something more leisurely, I would recommend reading The Dresden Files, which is a series of contemporary fantasy/detective/mystery books written from the perspective of a wizard who lives in modern-day Chicago.

Charting a resilient path for the future

The ESnet Site Coordinating Committee (ESCC) meeting on 22-23 October was attended by over 50 members representing all of the major DOE sites and projects supported by our team. This was the first ESCC meeting held via Zoom.

The meeting focused on network resiliency, both on lessons learned from adapting to working from home, as well as longer term plans for ESnet6. 

Highlights of the sessions were ESnet’s director Inder Monga’s presentation on the ways we ensured operational continuity during the pandemic. 

Inder Monga presents on ESnet's support for the DOE complex during the pandemic
Inder Monga presents on ESnet’s support for the DOE complex during the pandemic

The DOE ESnet program manager, Ben Brown, provided a vision for future research opportunities as well as future operational needs for the nation’s scientific complex.

DOE's Ben Brown presents on future objectives and priorities
DOE’s Ben Brown presents on future objectives and priorities

Attendees identified several key activities for ESCC collaboration as part of advancing shared resilience goals. Foremost among these is the creation of a working group to develop improved metrics for ESnet resilience, to identify ways that resilience features can be better incorporated into infrastructure funding and planning, and to establish better ways to engage scientific programs into risk management processes.

ESnet thanks ESCC participants for attending and we look forward to returning to in-person ESCC meetings in future!

A sample of attendees zoom screenshots - the first ESCC via Zoom
A sample of attendees – the first ESCC via Zoom

Zeekurity at ESnet

Zeek is an open source network security monitoring software extensively used by ESnet.  Zeek (formally called Bro) was initially developed by researchers at Berkeley Lab. Zeek allows users to identify & manage cyber threats by monitoring network traffic. It acts as a passive monitoring software (NSM – Network Security Monitor), that gives a holistic view of what is transpiring in the network and gives visibility into the network traffic. 

In order to better understand network behavior and provide flexible security services, we use Zeek as an important part of our data center security architecture and are experimenting with placing Zeek clusters on various WAN high value points. This is providing technical insights as well as significant challenges. 

In this post we would present some of our efforts in approaching the WAN security using Zeek for network monitoring, with successes and challenges hit during the process and interesting things learned.

Zeek on the ESnet LAN:

Monitoring local area and data center networks is a familiar and less complex network traffic monitoring design, and ESnet is no different. The traffic flowing through the LAN networks is currently monitored using two Zeek clusters, one at Brookhaven National Lab and another for the west coast at Berkeley Lab. We have implemented BHR (black hole routing) functionality on our data center routers to block external actors which violate our established policies based on Zeek detections on both IPv4 and IPv6 protocol stacks. 

Apart from network security monitoring using “standard” Zeek detection scripts, many enhancements and custom scripts written by the ESnet Security team members serve a vital role in detecting various kinds of suspicious activity. Recently, a Zeek package – Zeek-Known-outbound contributed by Michael “Dop” Dopheide won the first prize in the Zeek Package Contest-2 held in May 2020. The package provides the ability to track and alert on outbound service usage to a list of ‘watched’ countries, and also adds the country codes for the origin and recipient hosts in one of the log files that Zeek generates called conn.log, to log all the connection attempts seen on the network. The motivation behind this work came from the discovery of few systems contacting hosts in foreign countries for package updates, and DNS services found during routine log analysis. 

Zeek on the ESnet WAN:

To augment our LAN efforts on a wider scale, we have been experimenting with monitoring the network traffic on the WAN side of the network using Zeek in order to get more visibility and to provide improved security/network services. Most of this work is experimental: iterative design changes as we use what we learn from stage 1 to stage 3 and beyond.

  • Some notable differences and challenges from typical LAN network: 
    • Data Volume: There are a large number of WAN links that run at 1-400Gb/s
    • Data Encapsulation: Data with variable length headers is problematic, so we have been employing a load balancer to address this problem. 
    • Asymmetric Data Flows: This is a hard problem to solve, especially when the network is distributed across the country. When the packets corresponding inbound and outbound flows between two network nodes follow different paths, it can be challenging to reconcile conversation activities as part of network monitoring.
    • Technical Integration: Coordinating activities between teams distributed geographically  introduces challenges, which we are developing ways to overcome.

At ESnet we thrive to push the boundaries and try innovative ways to address challenges, Zeek on the WAN is an example of that and in my next article I will discuss some ways we have been experimenting with to address above noted complex problems and specifically going into details of the research been done in addressing Asymmetric Data Flows on WAN.

On the Path to ESnet6—Seeing the Light

ESnet6 Network

Three years ago, ESnet unveiled its plan to build ESnet6, its next-generation network dedicated to serving the Department of Energy (DOE) national lab complex and overseas collaborators. With a projected early finish in 2023, ESnet6 will feature an entirely new software-driven network design that enhances the ability to rapidly invent, test, and deploy new innovations. The design includes:

  • State-of-the-art optical, core and service edge equipment deployed on ESnet’s dedicated fiber optic cable backbone
  • A scalable switching core architecture coupled with a programmable services edge to facilitate high-speed data movement
  • 100–400Gbps optical channels, with up to eight times the potential capacity compared to ESnet5
  • Services that monitor and measure the network 24/7/365 to ensure it is operating at peak performance, and
  • Advanced cybersecurity capabilities to protect the network, assist its connected sites, and defend its devices in the event of a cyberattack

Later this month, ESnet staff will present an online update on ESnet6 to the ESnet Site Coordinators Committee (ESCC). Despite the challenges of deploying new equipment at over 300 distinct sites across the country and lighting up approximately 15,000 of miles of dark fiber during a pandemic, the team is making great progress, according to ESnet6 Project Director Kate Mace.

“We’ve had some delays, but our first priority is making sure the work is being done safely,” Mace said. “We have a lot of subcontractors and we are working closely with them to make sure they’re safe, they’re following local pandemic rules and they’re getting the access they need for installs.

“The bottom line is that we have a lot of pretty amazing people putting in a lot of hours and hard work to keep the project moving forward,” Mace said.

When completed in 2023, ESnet6 will provide the DOE science community with a dedicated backbone capable of carrying at least 400 Gigabits per second (Gbps), with some spans capable of carrying more than 1 Terabit per second.

The current network, known as ESnet5, comprises a series of interconnected backbone rings, each with 100Gbps or higher bandwidth. ESnet5 operates on a fiber footprint owned by and shared with Internet2. Once the switch is complete, Internet2 will take over ESnet’s share of the fiber spectrum to provide more bandwidth to the U.S. education community.

“We’re almost done with the optical layer, which is a big deal,” Mace said. “It’s been a major procurement of new optical line equipment from Infinera to light up the new optical footprint.”

Mapping the road to ESnet6 

Back in 2011, using Recovery Act funds for its Advanced Networking Initiative, ESnet secured the long-term rights to a pair of fibers on a national fiber network that had been built, but not yet used. Because there was a surplus of installed fiber cable at the time, ESnet was able to negotiate advantageous terms for the network.

As part of the ESnet6 project, ESnet and its subcontractors began installing optical equipment along the ESnet fiber footprint starting in November 2019. The optical network consists of seven large fiber rings east to west across the U.S., and smaller “metro” rings in the Chicago and San Francisco Bay areas.

At this point, Infinera has completed the installation of the equipment at all locations. The four large eastern-most rings have passed ESnet’s rigorous testing and verification process ensuring that they are configured and working as designed, and most ESnet services in these areas have been transitioned over to the new optical system.

Infinera has turned over the other three large rings and is working closely with ESnet staff to address a number of minor issues identified during testing.

ESnet and Infinera are collaborating on turning up, testing, and rolling services to the new network in the Chicago and Bay Area rings. The installation in these areas is more complex because it is re-using the ESnet5 fiber going into the DOE Laboratories.  

“The ESnet and Infinera teams have worked really well together to overcome all of the typical challenges we expected on a network build of this scale, as well as some unexpected obstacles,” said Joe Metzger, the ESnet6 Implementation Lead. 

The typical expected challenges ranged from installing thousands of perfectly clean (microscopically verified) fiber connections, to the unexpected, such as engineers driving for hours to get to a remote isolated location to install the equipment only to find the access road is drifted in with snow, or somebody changed the lock.

Most of the unexpected challenges were related to COVID-19.

“It was amazing to see how the facility providers, including the DOE Laboratories, ESnet and Infinera teams worked together to find safe, workable solutions to the COVID-19-related access constraints that we encountered during the installation,” said Metzger.  

The team expects the optical system build to be fully accepted and all services transitioned over to it by Oct. 1, completing what they are calling ESnet5.5, the first major step in the transition from ESnet5 to ESnet6.

To get to this point, ESnet’s network engineers needed extensive, hands-on training on the new Infinera equipment and built a specialized test lab at Berkeley Lab. To do this, a test lab was built at Berkeley Lab to provide hands-on training. Engineers take a weeklong session learning how to configure, operate, and troubleshoot the equipment deployed in the field.

The next major step will be the installation of new routers for the packet layer, which is expected to begin in early 2021, Mace said.

And of course, this is all being carried out while ESnet keeps its production network and services in regular operation and with the undercurrent of stress from the COVID-19 pandemic. 

“We’ve got to keep the network running,” Mace said. “And we are hiring additional network engineers, software engineers and technical project managers.

ESnet is supported by DOE’s Office of Science.

Written by Jon Bashor