Libevent 2.0.x: Like Libevent 1.4.x, Only More So.

Tuesday, January 26, 2010 | 2:56 PM

Labels: , ,

Libevent 2.0.x is the new version of Libevent from the Tor Project. Libevent is a software library whose purpose is to provide consistent fast interfaces to various operating systems' mutually incompatible fast networking facilities. gives applications two basic interfaces to these networking layers: a low-level interface where the application is notified when an operation (like a network read or write) is ready to begin, and a higher-level interface where Libevent itself manages network operations and the application is notified when the network operations are completed.

It's amazing how complicated things can become when you set out to do a simple task pretty well.

Back in 2003, I found myself working on a project written in C that needed (among other things) to do fast nonblocking IO on a whole bunch of sockets at once. Our placeholder asynchronous IO implementation, based on the widely portable select() and poll() syscalls, wasn't working well on Windows and didn't scale at all.

What does it do?

I'll add a bit of background here, for readers who haven't tried to do their nonblocking IO from scratch before. When you write a program that needs to work with many active network connections at once, you have two basic choices. One choice is to call IO functions that wait (block) until they are done, and to give each connection its own thread or process so that one connection's blocking doesn't force all the connections to block. For many applications, this approach doesn't scale too well.

A better choice is to use so-called "nonblocking" IO functions that make as much progress as they can, and return control to your program once they can make no more immediate progress. With this approach, your program needs some way to tell which connections are ready to do more IO, so that you don't waste CPU time polling connections that aren't ready yet. There are many kernel-provided APIs to do this, but the portable ones all scale badly (like select() and poll(), which are both O(N) in performance), and the fast ones are all non-portable (like kqueue() and epoll() and /dev/poll). This is where Libevent comes in. Libevent provides a consistent interface to wrap the common features of all these interfaces that tell you when connections are ready for IO, so that you can write clean code using nonblocking IO without having to re-code for every possible backend.

So, we started using Niels Provos's Libevent library around 2005. It was around version 1.0 back then, and it had... a few bugs. To get our application working I started submitting patches, and before too long Niels asked me if I'd like to co-maintain it.

Flash-forward to the present day. Libevent had grown an "extras" library to support commonly needed protocols like HTTP and DNS. Libevent had gotten real users too, including Tor, Memcached, and Chromium. With version 1.4, Libevent's core had grown to maturity, and did a pretty good job on most Unixish platforms and a not-too-awful job on Windows. But there were still some issues that made us decide to try for a major development effort in 2.0. I'll touch on the big ones here, explain what progress we've made, and what we still have to do before we can call Libevent 2.0 finished.

Cleaner interfaces
Some of our older interfaces had grown crufty and difficult to use safely. For example, you sometimes needed to remember to follow up every call to event_set() with a corresponding call to event_base_set(), or you might get strange behavior. Some interfaces were a bad idea to begin with, like the ones that relied on global state.

We could have just changed all our APIs to be less error-prone, but that would have broken all the older programs written to use them. Still, leaving the old APIs exposed by default would make it hard for developers to make sure they were avoiding them. As a compromise, we decided to put the preferred APIs in new headers (such as event2/event.h), and the deprecated APIs in compatibility headers (such as event2/event_compat.h), while retaining the old headers (like the old event.h) as wrappers to include both the new and old APIs.


Nearly every non-Windows OS does nonblocking network IO best with the system I described above, where the program first asks the kernel which sockets are ready to read or write and then asks the kernel to perform as much IO on them as could succeed right now without blocking. On Windows, though, the "tell me which sockets are ready" part is not so efficient or scalable. Windows's select() function (which Libevent 1.4 uses) is O(N), its WSAWaitForMultipleEvents() function doesn't support more than 64 sockets at a time, and so on.

Windows does have a good networking API, however. It just follows a different paradigm. Instead of asking the OS to tell you when an operation would be able to make progress, you tell the OS to begin an operation asynchronously, and ask the OS later (via an , or IOCP) to tell you which operations have finished.

It's quite hard to implement a notify-on-readiness interface using a notify-on-completion system like Windows provides. Fortunately, however, it's pretty easy (and frequently desirable!) to implement a notify-on-completion interface using a Unix-style notify-on-readiness API, so that's what we decided to do.

Supporting IOCP on Windows has been the most far-reaching challenge in Libevent 2.0, and has led to improvements throughout the rest of the (non-Windows-specific) codebase. I'll say more when I talk about individual improvements below.

Though most of the groundwork for an IOCP implementation is laid, the code itself is fairly immature. If you fetch the latest code from subversion, you can enable data transfer through an existing socket with IOCP. IOCP-enhanced connect and accept operations are not yet supported, nor is UDP. We hope to get these done pretty soon.


Buffered-IO improvements

Earlier Libevent versions had a "bufferevent" abstraction to handle the common case where you want to queue data received and data to be sent over a TCP connection, and receive notification as more data arrives or is sent. The interface here was a good match for IOCP, but the existing implementation was too inflexible and inefficient for serious use. Among other problems, its storage format relied on large circular buffers, it allowed only a single implementation for the interface, it behaved poorly on Windows, and it was just generally slow.


In Libevent 2.0, we've rewritten bufferevent from the ground up. The data storage format has changed from circular buffers to a linked list of buffers, reminiscent of the classic mbuf interface. We've added support for sending files with sendfile() and friends, and fixed up Windows support. The interface now supports multiple backends, OO-style, to allow multiple implementations of the "buffered IO" API. So far, we have five such backends: the basic socket implementation, a filtering wrapper implementation, an SSL-based implementation (see below), a socketpair-style implementation, and an IOCP-based implementation.

Preliminary results around April 2009 indicated that the rewritten codebase had improved our large-payload HTTP benchmark performance by between 35 and 174 percent. We hope that as we profile and benchmark this code more aggressively, we can find more opportunities for improvement here.


Improved multithreading

Older versions of Libevent allowed one event loop structure per thread; adding or deleting events from another thread wasn't supported. This was not only an annoyance for writers of multithreaded applications; it made IOCP support nigh-impossible, since we needed to have the IOCP loop run in separate threads from the main event thread.

The basic work here is finally done; as of last week, there are no major non-locked parts of Libevent remaining. We may still need to do performance tuning here: different operating systems provide mutex implementations with wildly differing characteristics.


Our next big multithreading initiative will be better support for parallelizing event callbacks across multiple CPUs on demand. This will be trivial for Win32 IOCP callbacks, but will require actual coding for older Unix backends. To ease migration, we'll need to allow programs to enable this functionality one callback at a time, so that we don't force them to upgrade all of their code at once.


As noted above, there's now a bufferevent backend that uses the OpenSSL library to encrypt traffic using the SSL/TLS family of protocols. This wasn't as simple as writing a regular data filter, since the protocol's support for session renegotiation allows a single virtual read/write operation to be implemented as multiple reads and writes to the network.


The OpenSSL cryptography library has support for doing TLS encryption directly to the network or over a user-provided IO object. We've taken both approaches, so that an OpenSSL bufferevent can either deliver data to a network socket or to another bufferevent. This way, Unix programs can have a bufferevent that speaks SSL directly to the network, and Windows programs can wrap an SSL layer over an IOCP-based backend.

We'd eventually like to add support for other free SSL libraries, such as GnuTLS and NSS, though we're not planning to work on this for Libevent 2.0.x unless somebody wants to contribute the code.

As you might imagine, we've added and rewritten a lot of code between Libevent 1.4 and Libevent 2.0. Without good unit tests, we'd probably have introduced a fair number of bugs.


Unfortunately, our old unit tests weren't so good. Though the overall coverage was decent (about 72%), some parts of the code were hardly tested at all (the DNS code and the IO buffer code were both under 60% coverage). The test suite was fairly fragile, too: many tests relied on global state, which made it hard to write new ones, especially if they required inducing strange error conditions.


For Libevent 2.0, we put a major effort into our testing code. Tests now run in separate processes, so they can't affect one another no matter how badly they trash global state. This not only stopped us from adding new bugs, but also revealed a few longstanding ones. Thanks to this change, it's easy to test far more error conditions than we could test before. This has gotten us up to around 80% test coverage overall, with the least-tested module at around 72%.

Before we release Libevent 2.0, I want our test coverage to be at least 80% for every module. I'd also like to do a manual coverage audit to look for any untested sections that seem particularly error-prone.



Next steps
In the coming months, we'll want to get the Libevent 2.0.x series finished. This will mainly involve fixing known bugs in the Sourceforge trackers, finishing the IOCP code, and testing the heck out of everything.

In parallel, I've been working on a short book to serve as an introduction to nonblocking networking in general, and to Libevent in particular. The latest draft is at http://www.wangafu.net/~nickm/libevent-book/.

After Libevent 2.0.x is stable and released, we'll be able to start looking at missing features for Libevent 2.1.x. These aren't at all final yet, but I'm hoping to get improved thread-pool support for IOCP and non-IOCP users alike, and better support for UDP-based protocols. We'll probably have many other changes to make based on user feedback from Libevent 2.0.

Thanks to Google for their support of the project.


By Nick Mathewson, Tor Project

Love for LuaJIT

Wednesday, January 20, 2010 | 2:41 PM

Labels: , ,

The last few years have been an exciting time for dynamic language implementations. The latest generation of JavaScript engines – Mozilla TraceMonkey, WebKit SquirrelFish Extreme, and Google's own V8 – are all based on just-in-time (JIT) compilation, which has led to dramatic speedups for web applications. The Unladen Swallow project is building a JIT for Python based on LLVM. But you may not have heard of the dynamic language Lua or the one-man LuaJIT project, which released a beta of its long-awaited 2.0 two months ago, along with some very impressive benchmarks.

LuaJIT is developed by Mike Pall, an open source developer located in Germany. Since its first release in 2005, LuaJIT has been at the forefront of dynamic language performance. In 2008 Mike announced that he was working on a complete rewrite based on trace compiler technology. It breaks with a long tradition of method-at-a-time JIT compilers and seems especially well suited for compiling dynamic languages. As LuaJIT shows, this approach yields performance that can rival even offline, static language compilers.

We use Lua internally at Google, and are very happy to be sponsoring the port of LuaJIT 2.0 to x86-64 (the initial release was for 32-bit x86 only). The full list of sponsors can be found on the LuaJIT sponsors page. The x86-64 port will be released under the MIT/X license, just as previous LuaJIT releases have been. Many thanks to Mike Pall for his excellent work on LuaJIT.

Linux.conf.au: We're in Wellington!

Friday, January 15, 2010 | 1:08 PM

Labels: ,

This year's linux.conf.au is in Wellington, New Zealand. It's starting this weekend, Sunday January 17th, and runs all week, through Saturday, January 23rd. We'll have members of the Open Source Team at the conference all week. And we're especially excited about giving a talk next Saturday at Open Day!


Stop by to visit us at our Open Day table if you're in the area: it's free and open to the public (not just for conference-goers). Ask questions, meet people in the open source space, or just hang out and hack with us. We'll be giving a talk called "Open Source for Newbies" and we'd love to get your questions about the open source community, what you can do for open source, or how to get started in open source even if you don't know anything about it. It should be a fun day with activities for kids, lots of speakers and booths, and a hackfest going on. There's even going to be some electric cars there!

Check out the Open Day wiki on the Linux.conf.au website to learn some more. It's being held at the Wellington Town Hall. Here's the details:

Where: Wellington Town Hall
Date: Saturday January 23, 2010

Time: Doors open at 11.00 am and close at 2.00 pm

For those of you attending the conference, members of our team Leslie Hawthorn, Cat Allman, and Jeremy Allison will all be giving talks on Thursday (schedule here). Come listen to them speak on a variety of topics for the open source community today. Finally, we're having a miniconf on Google Wave™ on Monday, January 18, that is also available to all conference attendees.

Hope to see you there!

Third Annual LLVM Developers' Meeting

Thursday, January 14, 2010 | 5:00 PM

Labels: , ,



With new year upon us, I thought I would take a minute to update everyone on the great progress made in 2009 by the LLVM Project. For those not familiar with LLVM project, it's a cross platform complier infrastructure. We had two successful releases of LLVM, our first release of Clang, a compiler front end for various C languages, and we held our third annual Developers' meeting on October 2, 2009. This meeting is an opportunity for both LLVM and Clang developers to have a face to face meeting, exchange ideas, share their experiences and work together on LLVM or Clang.

This Developers' Meeting was the largest to date! We had 170 attendees with a huge range of different academic and company affiliations. The Developers' meeting was structured to have both a general overview about the major LLVM subsystems and also applications of LLVM for various projects. The LLVM subsystems talks included Clang, scalar evolution and loop optimization, the future of the LLVM register allocator, and a tutorial on building a backend for LLVM. We had many talks on applications of LLVM or Clang, such as OpenCL, Unladen Swallow (a Google sponsored project), Rubinius, and so much more! Community members gave a total of 17 technical presentations on LLVM and its applications, and you can view all the slides and videos from the talks on the Developers' Meeting website.

This event is not possible without the support of our sponsors. Google generously helped fund several students and active members of the LLVM community to attend the meeting and present their LLVM-related work. I'd like to briefly summarize the work by some of these developers and students:

Anton Korobeynikov

Anton is a long time developer for the LLVM project, LLVM's project administrator for Google Summer of Code™ (GSoC) and also an LLVM code owner. For his day job, he is a Ph.D student in applied statistics at Saint Petersburg State University, Russia.

Anton presented an invaluable Tutorial on Building a Backend in 24 Hours. His tutorial overviews the various code generation phases, such as SelectionDAG, Register Allocation, and post register allocation. He goes into the different pieces of the backend that one will need to implement such as the target, subtarget, lowering, register set, instruction selection, and the calling convention. If you have ever wanted to write a new backend for LLVM, this is the talk that you will want to see. (Slides, Video)

Bruno Cardoso Lopes

Bruno is a multi-year participant with the GSoC project, active LLVM contributor, and Ph.D. student at University of Campinas, Brazil. This year, he presented Object Code Emission and llvm-mc. His talk gave a high level overview of the LLVM Machine Code Emitter and focused on the emission of object files. The motivation behind direct object code emission is to bypass the external assembler, and speed up compile time.

His work is a part of the LLVM Machine Code (MC) Toolkit project. The project aims to build better tools for dealing with machine code, object file formats, etc. The idea is to generate most of the target specific assemblers and disassemblers from existing LLVM target .td files and to build an infrastructure for reading and writing common object file formats. His talk goes into the details regarding the design, current implementation status, and future directions. (Slides, Video)

Duncan Sands

Duncan is also a long time developer for the LLVM project and an LLVM Code Owner. His presentation was titled Reimplementing llvm-gcc as a gcc plugin. His project, DragonEgg, aims to replace gcc's optimizers and code generators with those in LLVM without modifying gcc at all. This is done using a plugin via mainline gcc's new ability to load additional logic and passes at runtime via a plug-in mechanism.

The plugin is a shared library that is loaded by gcc-4.5 at runtime and is currently under development. If you are interested in helping with this project, please see the DragonEgg website for more information. Take a look at the slides and video from Duncan's presentation to see where you can get started.

Santosh Nagarakatte

Santosh is currently a Ph.D. student at the University of Pennsylvania. He presented SoftBound, one of his current research projects. SoftBound is a compile-time transformation for enforcing spatial safety of C. It works by recording base and bound information for every pointer as disjoint metadata. It is a software-only approach and performs metadata manipulation only when loading or storing pointer values. The advantage of this approach is that it provides spatial safety without requiring changes to C source code. His talk provides a brief description of the formal proof and LLVM implementation. (Slides, Video)

If you are interested in attending one of our future Developers' meetings, please join the LLVM-announce mailing list. Many thanks again to Google's Open Source Programs' Office for making this event possible!

The 2009 Semantic Robot Vision Challenge

Tuesday, January 5, 2010 | 10:49 AM

Labels:

The Semantic Robot Vision Challenge (SRVC) is a robot scavenger hunt competition that is designed to push the state of the art in image understanding and automatic acquisition of knowledge from large unstructured databases of images (such as those generally found on the web). In this competition, fully autonomous robots receive a text list of objects that they are to find. They use the web to automatically find image examples of those objects in order to learn visual models. These visual models are then used to identify the objects in the robot's cameras.

The lastest SRVC was hosted at the International Symposium for Visual Computing (ISVC) in Las Vegas Nevada from Nov 31 to Dec 2, 2009. Five individual teams competed this year and hree of the teams brought robots and participated in both the robot and software league. The other two teams participated only in the software league.


The arena was set up with four chairs, three round tables, two tables with drawers, and a small set of stairs for displaying objects. All of the furniture had at least one object for the robots to discover on it, but not all of the objects in the environment were on the list of items for the robots to find.



The crowd was very interested in watching the different robots moving around the environment during their runs. Unfortunately, the robot teams themselves were plagued with various hardware and software integration troubles and only one team was able to find any objects. However, the robot teams that did not perform well demonstrated that their software was very capable of doing the work in a stand-alone mode. The visual classification results from the software league were very impressive.

The official list of objects consisted of:
  1. pumpkin
  2. orange
  3. red ping pong paddle
  4. white soccer ball
  5. laptop
  6. dinosaur
  7. bottle
  8. toy car
  9. frying pan
  10. book "I am a Strange Loop" by Douglas Hofstadter
  11. book "Fugitive from the Cubicle Police"
  12. book "Photoshop in a Nutshell"
  13. CD "And Winter Came" by Enya
  14. CD "The Essential Collection" by Karl Jenkins and Adiemus
  15. DVD "Hitchhiker's Guide to the Galaxy" widescreen
  16. game "Call of Duty 4" box
  17. toy Domo-kun
  18. Lay's Classic Potato Chips
  19. Pepperidge Farm Goldfish Baked Snack Crackers
  20. Pepperidge Farm Milano Distinctive Cookies
Objects 5-9 were part of a list of generic objects that were given in advance to the teams. This was in a response to a suggestion from previous years to allow the teams a chance to try to build classifiers that would be capable of recognizing a generic class of objects rather than a very specific one. You can find a full analysis of the results on the SRVC site.

The US Naval Academy entered a robot based on a iRobot Create platform which used a Hokuyo URG LIDAR for navigation and a camera mounted on a mast for ddetecting the objects. This robot was by far the least expensive of the competitors but was still capable of carrying a laptop as well as the other hardware. However, under this load, the robot rapidly drained its batteries but was still able to capture a few images of objects and label them correctly.

Kansas State University entered with a robot based on a MobileRobots Inc. Pioneer 3 platform. They also had a Hokuyo URG LIDAR for navigation and a camera on a mast used for identifying the objects in the environment. This robot was able to traverse most of the environment successfully. Unfortunately, the robot was not able to aim its camera at enough objects to get a chance to correctly identify them.

The University of British Columbia (UBC) robot had by far the most complex setup of all of the robot competitors. They used a MobileRobots Inc. Powerbot that carried four laptops, multiple cameras--including a monocular PowerShot Canon camera, and a Pt. Grey Bumblebee2 stereo camera, and multiple LIDARs both for navigation and object extraction. The team demonstrated several impressive non-scored runs both before and after the event. However, during their officially scored event, the process that ran the primary object detection camera failed and so they were unable to identify any objects.

For more detailed descriptions of the robots, the software, and the computer vision techniques used by these teams, please refer to the team presentations. Each team's workshop presentation has been posted to that page. Links to their source code will also be posted.

As this contest continues to grow and evolve, the organizers are quite pleased by the progress of the computer vision research that is being demonstrated at these events. This was shown quite handily by the very high scores in the software-only league. However, the organizers would also like to remind the community that this is a robotics competition and thus want to see advances in active vision techniques, intelligent mapping and exploration, and reasoning about where objects are likely to be found (e.g. the "semantics" of the objects). In previous years, most of the robotics competitors took a random-walk approach to exploring the environment where they would hope to cover all of the space and get enough images to see the objects in question. However, the organizers this year were quite pleased to see the previous reigning champions from the University of British Columbia take the robotic exploration aspect of the competition to the next level. The organizers would like to take the time to highlight some of the significant aspects of the UBC team's approach to how to control their robot.

The UBC team approached the contest in two distinct phases: a mapping phase, and an object identification phase. The strategy of UBC this year was first to navigate the environment and map it using the SICK LIDAR and a SLAM algorithm (Simultaneous Localization and Mapping). Then the robot would revisit the obstacles in the room and scan them with the Hokuyo LIDAR. Flat horizontal surfaces would be detected in the scans from detection of a few consistent surface normals and a verification stage of the hypothesis of a planar surface. The regions that point out of the plane are interpreted as objects, and 3D bounding are computed from their convex hull (see figure below). This gives a set of candidate locations for the objects. The robot then would revisit these locations to take snapshots and run its object recognition algorithms on these snapshots. Three object recognition methods were implemented, SIFT matching, Contour matching, Deformable Parts Models (DPM). A fourth one using spherical harmonics to recognize 3D data was turned off because it was not quite ready. The DPM approach was trained on the objects known in advance, but could not be used for internet images as it was slightly too slow for that even though it had been rewritten in C.
The organizers were very impressed by the fact that the robot would first identify the specific locations where objects should be found, e.g. the tops of tables and chairs, and then go back and use the 3D sensors to explicitly segment out the locations of the objects to find them. This is exactly the kind of active robotics vision research that we feel will help to push forward the state of the art in real-time computer vision on physical robots and we hope to see more of this kind of approach on future competitors.

To sum up, the research being performed by the teams interested in this competition is extremely impressive. The teams are definitely rising to the challenge put forth by the organizers. Congratulations to all that participated!

Merry Music: MusicBrainz's Latest Summit and 10th Anniversary

Tuesday, December 22, 2009 | 1:17 PM

Labels: ,

The yearly MusicBrainz summit serves an important function in building our community: we talk about issues facing MusicBrainz and we plan the road map for MusicBrainz projects. The summits are usually scheduled to allow as many people to attend as possible and this year we chose Nürnberg, Germany as our location. MusicBrainz contributor Nikolai "Pronik" Prokoschenko lives in Nürnberg and was our local contract and ended up planning most of the summit.

Pronik found us a conference room that we rented for the entire day, complete with open WiFi, which is important if you plan to have a room full of geeks. He also found us a cheap Gasthof that provided lodgings slightly better than a Hostel for a mere 20€ per person per night — a really good deal for Europe. The evening before the summit we all sat in the Gasthof and were treated to some confusing German/Greek cuisine with some of the most rude service any of us have ever encountered. But, our group is used to dealing with the crude Internet public, so we managed to laugh off the horrible service and still have a great time.

To our luck there was a grocery store right next door to our Gasthof and we commenced another successful crowd sourced breakfast. Four people were each given 20€ with the instructions to buy food/drinks that they would like to eat/drink for breakfast/lunch. No collusion was allowed between people! Once the shopping was complete we walked to the conference room, settled in and dove into the masses of food we'd collected. Many tasty bread rolls with jam, nutella, cold cuts and cheese were consumed. Of course we had fun things like a case of Bionade, juices, tea, gummy bears and chocolate. Crowd sourcing breakfast takes a potentially frustrating chore and makes it fun for everyone.


Plus, Pronik and his mate Kira brought a MusicBrainz decorated cake to celebrate 10 years of MusicBrainz!


As people were eating, we started to collect an unconference-like agenda of what people wanted to talk about. We decided to have a detailed state of the project talk including recent developments from meeting our customers in Europe. We also talked about current development processes and some of the problems associated with these processes. Oliver Charles, a 2008 Google Summer of Code™ student, gave an introduction on how to hack on the MusicBrainz server, based on his work from the last year.

Most of the time was spent discussing new features for once we release our much anticipated Next Generation Schema. At times we managed to get into deep philosophical discussions about what MusicBrainz is and what it should be. At other times we discussed light hearted topics with lots of joking. These summits do wonders for building our community and getting people on the same page. We manage to explore many topics and reach consensus on many points in one day instead of spending weeks on the same discussions online.

Finally, in the evening we cleaned up our space and retired to a local beer hall where we continued the discussion in a less formal manner. If you're interested, we posted all the session notes from the summit on our wiki. All in all, this event was fun and not much effort to put on — thanks to Pronik! On another happy note, 1/3 of the people in attendance were women, which is much better than most tech summits I've attended.

In total we spent about $1500, including all the food, drinks, lodgings and one person's travel costs. For a summit with 12 people, I think we did rather well! I call that Google's support well spent — thanks again for supporting MusicBrainz, Google!

London Open Source Jam 15

Friday, December 18, 2009 | 12:49 PM

Labels:

On the 3rd of December we held the latest (and greatest) Google London Open Source Jam at our offices near Victoria. The Jam is a way to get like-minded Open Source contributors and users together and give them a chance to give a 5 minute talk on something dear to their hearts, all the while availing themselves of free beer and pizza!

This time's topic was the somewhat catchall: "the Web." Like always, the topic is more of a guide than a rule, so we had some pretty diverse talks.

Our very own Jon Skeet set the evening off to a good start by telling us all about Noda Time — a new Open Source library for handling dates and times in .NET, based on the Joda Time library for Java.

Simon Phillips on Google Wave

Simon Phillips is a consultant to the film business and gave a great presentation on how he uses Google Wave to help him work closely with directors, script writers, set designers and the like. He showed some great ideas for using Wave in this way and was canvassing for help in developing Open Source Wave robots to help this process.

Simon Stewart gave a rallying cry for making the web more accessible to the blind and deaf, especially in this modern era of HTML canvas and video tags. By ensuring your sites are accessible, you open them up to more users, and as a useful side effect you also make them more testable.

HTTP has started to show its age, and maybe it's time for a leaner, meaner protocol to come along. I took a brief break from my hosting duties to present a summary of SPDY, a chromium.org project to develop a replacement protocol which will deliver data to our browsers faster.

Glyn Wintle Gets Comfortable

If you run a web site, you may have come to fear the "Slashdot effect" where you are linked from a popular website and get a spike of traffic. Glyn Wintle from the Open Rights Group (ORG) informed us that this is nothing compared to having a bunch of knitting forums link to you! His was a tale of Open Sourcing of knitting patterns and DMCA take-down notices. He also brought us up to speed on the latest from the ORG.

Sam Mbale gave us an update on his work bringing open source to Africa and told us all about BarCamp Lusaka which he'll be attending. We look forward to hearing how it went at another Jam.

Robert Rees gave us an experience report on using Velocity templates to divide responsibilities between engineers and web designers. It seems to work pretty well; contracts are enforced by unit tests, and designers know exactly what primitives they can use when laying out web pages.

Matt Savage on RESTful Acceptance Tests

Finally, Matt Savage talked about his ideas for RESTful acceptance tests, and Steven Goodwin gave us an update on his project to build a "Wallace and Gromit" house.

You can find more pictures of the event on Picasa Web Albums. To find out more about the Google London Open Source Jam, visit http://osjam.appspot.com/. If you'd like to receive regular updates about future jams, sign up for our mailing list. We hope to see you at future jams!

Rocking the Grid: The Globus Alliance's Second Google Summer of Code

Tuesday, December 15, 2009 | 4:20 PM

Labels: , , ,

The Globus Alliance is a community of organizations and individuals developing fundamental technologies behind the "Grid," which lets people share computing power, databases, instruments, and other on-line tools securely across corporate, institutional, and geographic boundaries without sacrificing local autonomy. We first participated in Google Summer of Code™ in 2008 and we found the experience extremely productive both for the Globus Alliance and the individual mentors, so we wanted to confirm the value of the program for the students who took part. We contacted our eight students from last year to find out what impact Google Summer of Code had on their lives and careers. While many of our students still remembered the experience fondly, and said it was valued highly by prospective employers, there were two students who had particularly remarkable stories.

AliEn Grid Site Dynamic Deployment and Working at CERN

Last year, Artem Harutyunyan, mentored by Tim Freeman, developed a set of scripts on top of Globus Nimbus to dynamically deploy an entire AliEn Grid site (AliEn is the Grid infrastructure which is used by scientists participating in the ALICE experiment at CERN). His collaboration with the CERN and Globus Nimbus folks went beyond his Google Summer of Code work, and resulted in a new framework, called CernVM Co-Pilot, for execution of 'pilot' Grid jobs on cloud resources. His work is currently used in production to run Grid jobs from CERN'S ALICE experiment, and there are plans to extend it for the execution of ATLAS and LHCb jobs. Artem also co-authored two papers on his work: "Dynamic AliEn Grid Sites on Nimbus with CernVM" was presented at the 17th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2009) in Prague, and "Building a Volunteer Cloud", which includes a description of CernVM Co-Pilot, was presented during the Latin American Conference on High Performance Computing in Mérida, Venezuela.

Holder-of-Key Single Sign-On

Joana M. F. Trindade, mentored by Tom Scavo, spent last summer implementing a Holder-of-Key Single Sign-On profile handler for the Shibboleth Identity Provider in Globus GridShib. And, since then, things have just been getting better for her. Thanks to her outstanding summer work, she was offered an appointment as a Visiting Scholar at UIUC, where she worked on researching fault injection in virtual machines with Professor Ravi Iyer. After six months in that position, Joana was offered admission into the masters program at UIUC, where she is currently working with Professor Marianne Winslett. More importantly, Joana tells us that participating in Google Summer of Code gave her a renewed sense of confidence in her research abilities, having previously thought that her academic background was insufficient to gain admission into a top-tier university in the US. Joana tells us that "After Google Summer of Code, I regained that hope, and I must say I'm really happy to have found a topic in Globus to which I could contribute, and that in turn opened so many doors for me."

Congratulations Artem and Joana for all you have achieved!

Lessons Learned

Our first Google Summer of Code last year also had its fair share of challenges, including two students who didn't make it through the program, but it gave us the opportunity to learn a lot about how to mentor and manage summer students. We were fortunate to be selected again this year as a Google Summer of Code mentoring organization, which allowed us to apply everything we learned. First of all, we required students to provide more information about their background and the project they were proposing. Last year our student application form was essentially a blank form saying "Tell us about your project here," so this year we presented prospective students with more specific questions. We also decided to check in with our students more often which, at least in one case, allowed us to identify a problem between a student and a mentor early on, giving us time to deal with it constructively before the midterm.

In the end, applying what we learned during last year's Google Summer of Code and as well as the Mentor Summit had a noticeable effect. We were fortunate to be given ten students to mentor, and all ten students passed. Furthermore, our mentors report that practically all the code written by the students has either already been released or will be released soon. In fact, overall, we felt that this year's students rocked. Here's a summary of their summer work.

Going Beyond a Single Cluster

The Globus Nimbus cloud toolkit allows you to turn your cluster into an Infrastructure-as-a-Service (IaaS) cloud. However, it was mainly geared towards managing a single cluster. Not any more! Adam Bishop, mentored by Ian Gable, worked hard over the summer to add new components enabling multiple cluster support for Nimbus. He developed a series of production-quality plugins, which have already been committed to the Nimbus source repository, that publish the state of Nimbus cluster back to a Globus MDS Registry. This allows the availability of cloud resources across multiple Nimbus clusters to be gathered together into a single registry, which is the first step towards adding cross-cluster support to Nimbus.

Spilling Over Multiple Clusters

Another student, Jan-Philip Gehrcke, mentored by Kate Keahey, also spent the summer with his head in the clouds, but in a good way: he developed the Clobi project, a job scheduling system supporting virtual machines (VMs) in multiple IaaS clouds, with support for Globus Nimbus and Amazon EC2 clouds. In a nutshell, there are many scientific applications that are typically run as "jobs" on a compute cluster. Jan-Philip's project allows these jobs to be submitted to a cloud instead of to a traditional compute cluster. The most interesting use case is when a site operates a Globus Nimbus cloud and, during peaks in demand for computational capacity, extends its capacity momentarily by spilling the jobs over to a second (or third, or fourth, ...) cloud such as Amazon EC2. Although Clobi is not tied to any particular application (its design is generic and should be useful whenever it’s convenient to distribute jobs across different clouds), the motivating application for Clobi is ATLAS Computing (for the LHC's ATLAS experiment at CERN). In fact, by the end of the summer, Jan-Philip was able to run a common ATLAS Computing application (the so-called “full chain”) successfully with Clobi. If you want more details about Clobi, check out this blog post written by Jan-Philip.

Incremental GridFTP Transfers

Enough about clouds, let's move on to the exciting topic of data. Globus GridFTP is a high-performance, secure, reliable data transfer protocol that is pretty good at moving data. Fast. Of course, there's always someone who wants to go even faster, like Shruti Jain, mentored by Michael Link. Shruti took globus-url-copy, the GridFTP client, and added a 'sync' feature that allows a local and remote file to be synchronized, by sending only the changed sections of the file. This results in more effective bandwidth utilization by avoiding redundant data transfers.

Checksummed GridFTP Transfers

Remember Mattias Lidman? We certainly do. In last year's Google Summer of Code, he developed a compression driver for the Globus XIO input/output library (which GridFTP depends on) to compress/uncompress data as it passes through it. However, although moving data faster is all good and well, it's not worth much if it somehow gets corrupted in-flight. So this year, Mattias, mentored by Joseph Bester, continued to work on Globus XIO and developed a Checksum Driver. Mattias's driver checksums GridFTP data streams allowing both ends of a GridFTP transfer to verify the integrity of the data.

CQL Queries Builder

You know one really cool thing grids are used for? Cancer research. The Cancer Biomedical Informatics Grid, or caBIG®, is an information network enabling all constituencies in the cancer community – researchers, physicians, and patients – to share data and knowledge. caGrid is the underlying service-oriented infrastructure that supports caBIG, and it relies heavily on the Globus Toolkit. Some of the data services in this architecture use a query language called CQL that is, well... complicated. To make life easier for scientists, Monika Machunik, mentored by Wei Tan, wrote a plug-in for Taverna (an open source tool used by scientists to design and execute workflows) for constructing CQL queries, allowing scientists to focus on their work rather than on the intricacies of the CQL language.

GridWay-Google Maps Mashup

Grids require coordinating resources across multiple organizations, and the Globus GridWay meta-scheduler is a great tool to do just that. However, coordinating hundreds or even thousands of machines across dozens of sites can get a bit messy using the console-based tools included with GridWay. Carlos Martín, mentored by Alejandro Lorca, tackled this problem by creating an interactive GridWay-Google Maps mashup, allowing the administrators and users of a GridWay installation to get a quick snapshot of the status of multiple sites and the jobs running in them, as shown in this screenshot:



Carlos used the Google Web Toolkit to develop this application, which is totally decoupled from GridWay, making it easy to install it alongside existing installations of GridWay. In fact, you can download the GridWay+Google Maps application and check out its documentation, including more screenshots, at the application's page on the GridWay site.

GridWay GUI

Srinivasan Natarajan, mentored by Jose Luis Vazquez-Poletti, worked on a more administration-oriented GUI for GridWay, allowing users to compose, manage and control their jobs instead of using the command line interface. This GUI includes a host of other features, such as host and user monitoring, filtering account statistics and execution history information, and support for processing DAGMan workflows, including visualizing dependencies between jobs in the workflow.



Both of the GridWay projects were presented in several sessions, including one on nuclear fusion, at the EGEE'09 conference in Barcelona, Spain back in September.

GridFTP Benchmarking

How about we get back to the subject of data management? The recent addition of UDT (UDP Data Transfer) support to GridFTP has made even faster transfer speeds possible. You guessed it: here's another student who couldn't resist the need for speed this summer. Jamie Schwettmann, mentored by Raj Kettimuthu, sought to characterize the performance of GridFTP over 10Gb/s networks, specifically to measure the speed increase given by UDT as compared to TCP transfers, as well as a number of other considerations such as CPU and memory overhead at both ends of the transfer. In doing so, they decided to develop an automated GridFTP benchmarking and throughput optimization utility called globus-transfer-test, which takes URL pairs from a list or on the command line, and allows for varying input parameters such as parallelism level, transfer type (memory-to-memory, disk-to-disk, etc), TCP Buffer Sizes, MTU sizes, and all other standard globus-url-copy options (except multicasting) and when possible, compares with other performance and throughput utilities such as iperf or scp. Designed for general use by users or administrators as well as to carry out our performance characterization, globus-transfer-test aims to provide enough information to optimize GridFTP options for maximizing throughput between grid sites. This common need has allowed collaboration with many other projects and organizations in the course of development and testing, including the US ATLAS Project, TeraGrid, and OSCER. Jamie even presented a poster on her project at the 2009 Oklahoma Supercomputing Symposium.

AJAX Framework for Globus Web Services

Many of the components in Globus are web services, which are not exactly human-readable creatures. Fugang Wang, mentored by Tom Howe, developed a JavaScript API that enables accessing Globus services from a web client using AJAX. Fugang's framework, which includes a backend service that mediates service requests to the Globus toolkit and an AJAX web client to access this services, makes life easier for Globus developers and users by allowing them to interact with Globus services from the comfort of their web browsers.

Secure Cloud Communications

And we'll end with the ever-popular subject of data management. Melissa Weaver, mentored by John Bresnahan, developed a PSK driver for Globus XIO. She first developed a program that, using OpenSSL libraries to encrypt and decrypt data using a stream or block cipher of the user's choice, allowed her to experiment with different lengths of keys and initialization vectors and different file sizes to make performance measurements. Then, she developed the XIO PSK driver itself, which used the results of the first program to implement an RC2 block cipher to ensure any communication between computers, once a connection has been set up, is secure.

High energy physics experiments at CERN! Cancer research! Nuclear fusion! Cloud computing! Fast data transfers! Oh my! Oodles of congratulations to our mentors and students for all their hard work and for making this such an awesome Google Summer of Code for the Globus Alliance!

Introducing namebench

Friday, December 11, 2009 | 3:08 PM

Labels: , , ,

Slow DNS servers can make for a terrible web browsing experience, but knowing which one to use isn't easy. namebench is a new open source tool that helps to take the guess-work out of the DNS server selection process. namebench benchmarks available DNS services and provides a personalized comparison to show you which name servers perform the best. As a System Administrator at Google, I was curious about measuring how BGP route selection affected the performance of Google Public DNS. This curiosity resulted in writing a small benchmarking script, which was further developed during my 20% time to become a full-featured application for Windows, Linux, and Mac OS X.



namebench is covered by the Apache 2.0 license, and was made possible by using several other great open-source tools including Python, Tkinter, PyObjC, dnspython, jinja2 and graphy. It also makes use of the Google Chart API to visualize the results:



In order to provide the most relevant results, namebench employs a number of interesting techniques. First, it personalizes the benchmark by making use of your browser history to see what hosts to benchmark with. It also determines cache-sharing relationships between different IP's and removes the slowest of these servers to avoid improperly benchmarking them solely on cached results. namebench will also report on DNS misbehavior such as DNS hijacking and censorship.

namebench 1.0 is available for download now. If you would like to discuss or have any questions namebench, please join the namebench mailing list. Happy hacking!

Joomla! Google Summer of Code™ 2009: Lots to Shout About

Thursday, December 3, 2009 | 2:21 PM

Labels: ,

The Joomla! project was thrilled to sponsor 18 Google Summer of Code students for 2009, and we are pleased to report that 16 (89%) successfully completed their projects. Most of the projects were based on ideas generated by the Joomla! community, and our community seems to be very excited about the results.

Our two primary goals for Google Summer of Code 2009 are to (1) develop relationships with student developers that will encourage them to continue working in the project; and (2) add features and functionality to the Joomla! CMS. Our participation in Google Summer of Code 2009 was very successful on both fronts.

Relationship to the Project

Several of our students this year were already contributing to Joomla! prior to participating in the program, and the Google Summer of Code experience has only strengthened that relationship. For example, one of our students, in addition to completing his project, is now a leader in the release of the next Joomla! version. At least two students (so far) have officially joined project working groups, and several others have contributed to the project over and above their Google Summer of Code projects. Many other students have also expressed interest in continuing the development of their code beyond the program timeframe.

This year, at the end of the term, we gave each student the opportunity to present a webinar where they could demonstrate their project to the community. Even though it was a lot of extra work, more than half the students did this. The results were excellent, and the students did really good, concise, focused presentations. We recorded and linked to the webinars on our site so that anyone in the community who is interested in the Google Summer of Code work can simply watch a short webinar to see an actual demonstration of the projects.

Using the Code

There are three ways the code from Google Summer of Code projects can be used within the Joomla! CMS. In some cases, some or all of the code will be incorporated directly into the core codebase for the upcoming Joomla! version 1.6. In other cases, the code has been published as an extension that can be downloaded and used by any Joomla! user on their website. The third method is that the code will be used as a basis for further work.

Some students have combined two of the methods above, for example, producing an extension for the current version 1.5 and making the code available for the core in our version 1.6.

More Information

We invite you to visit our Joomla! Community site for more information about the different projects and what was accomplished, and to download the code.