IEEE Xplore At-A-Glance
  • Abstract

code_swarm: A Design Study in Organic Software Visualization

In May of 2008, we published online a series of software visualization videos using a method called code_swarm. Shortly thereafter, we made the code open source and its popularity took off. This paper is a study of our code_swarm application, comprising its design, results and public response. We share our design methodology, including why we chose the organic information visualization technique, how we designed for both developers and a casual audience, and what lessons we learned from our experiment. We validate the results produced by code_swarm through a qualitative analysis and by gathering online user comments. Furthermore, we successfully released the code as open source, and the software community used it to visualize their own projects and shared their results as well. In the end, we believe code_swarm has positive implications for the future of organic information design and open source information visualization practice.



Throughout the world, there are perhaps millions of developers working on hundreds of thousands of software projects. They work together, in the common goal of making software better for everyone, though they may be in different timezones, in different parts of the globe. And they work on an enormous — and growing — amount of code. According to these source code hosting websites, has 230,000 projects, Google Code has over 50,000 projects, and has 10,638. That much code is certainly impressive, but even more so is the human effort and dedication required to create and maintain it. What goes on inside these projects? How can we bring to light the effort and dedication required to create open source software?

We created code_swarm to explore this data using the technique of organic information visualization. Unlike most software visualization applications, which give a structured, quantitative view of the data, code_swarm aims for a qualitative one. The qualitative approach allows us to bring the visualization of these software projects to casual viewers. It is not meant to be a replacement for traditional software visualization techniques, but a different, complementary view.

Code_swarm started as a simple experimental program that underwent many iterations. It evolved into an open source project that captured the attention of developers and casual viewers alike. Now, in this paper, we share our design decisions, results and observations with the information visualization community.

Some of the major points that we convey in this paper are:

  • We have designed a successful visualization of software projects that is accessible by a wide audience. Our application uses ideas from organic information design [8] to present an aesthetic animation of a project's history that is understandable by both developers and casual [19] viewers. We share the reasoning behind our design and the lessons learned from this experiment. (Sections 3 and 5)

  • We show that code_swarm accurately characterizes a software project's history through a qualitative discussion of the result videos. The combination of animation and organic information design allows a more human-centric visualization of a project that is not present in traditional timelines or code-based views. (Section 4.1)

  • We gather user feedback in the form of blog posts and comments made by the public. There were many, and in March, 2009, a Google search for "codeswarm" returns about 36,000 web pages. We discuss the more interesting and relevant comments in Section 4.2.

  • We show how the software community adopted code_swarm. After our application was released as open source, hundreds of users downloaded it and ran it with their own software project data. There are now well over 150 code_swarm videos on the web and growing. Plus, several people have extended code_swarm's capabilities in their own projects. (Section 4.3)

The contribution of this work is a study in the design of, results for, and response to code_swarm. It chronicles how organic information design was applied to visualizing software project histories. We have just described and motivated the problem of visualizing software development in this section. Our inspiration, derived from related work in software visualizaiton, generative art and organic information visualization, is discussed in Section 2. The data requirements, design justifications and implementation details are given in Section 3. The results of code_swarm, including video analysis and commentary, community response, and community adoption, are presented in Section 4. Finally, the paper concludes with lessons learned and suggestions for future work in Section 5.


Related Work

The general methodology of this paper is influenced by Pousman et al.'s [19] and Wattenberg's [30] work. They established the concept of casual information visualization as a complementary branch of traditional information visualization techniques with its own concerns and challenges. Among those is the challenge of evaluating systems for causal users. Later on in this paper, we take the approach of collecting spontaneous online user postings, as Wattenberg did in his "Baby Names" paper.

The design of code_swarm touches on three areas of research and art. The first one, software visualization, aids in the understanding of software systems and is our primary area of contribution. In this section, we review previous examples of software history visualizations and contrast them with our approach for code_swarm. In general, current techniques are well-suited to quantitative analyses of software projects while our objective is a qualitative one. The second area, generative art, uses computer graphics and mathematical algorithms to create aesthetic images. We take inspiration from these works to fulfill our design requirement for an animated, captivating visualization. The last related area we discuss is organic information design, which can be considered an intersection between information visualization and generative art. It is a dynamic, eye-catching way to display information and the basis for our technique.

2.1 Software Visualization

In his book, Stephan Diehl separates software visualization into three aspects: structure, behavior and evolution [4]. Our work relates to the evolution aspect because it is concerned with the software development process, rather than the software itself. Furthermore, it relates to systems in which the history of developer activity (distinct from the source code evolution) is visualized. Therefore, we will concentrate our discussion on systems that focus on developers or include them as a major component. For further reading on other types of software visualization, Diehl's book [4] is a good resource. Also, Storey et al.'s survey paper [23] gives a good overview of software evolution visualizations.

There are software visualization systems that feature developer-related data but are more analytic in nature than code_swarm. They are worth mentioning here because our application can complement their quantitative displays with qualitative information about software projects. Eick et al.'s SeeSoft [6] is a line-oriented visualization of source code. Each line in code is represented as a pixel-thin, color-coded line in a file's column. One possible mapping from software attribute to line color is by developer. Using this mapping, one can see the areas of the code each developer has worked on. In code_swarm, the primary focus is the people and code structure is secondary.

Other systems which visualize developer-related data are Augur [7] by Froelich et al. and Advizor [5] by Eick et al. Both are collections of "standard" information visualization views, such as matrices, bar charts and line charts which show different aspects of software. Augur's interface is based on SeeSoft and is linked to several other views. Its main contribution is an online visualization of software data. Advizor also uses multiple views to examine software data. Most notable among these are 2D and 2.5D matrix views that have developer as an axis. The three above systems offer many ways to quantitatively view the software data. This is excellent for software analysis, but we want to design for casual users as well.

The Visual Code Navigator [13], [26][28] is a suite of visualizations, of which CVSscan and CVSgrab are parts. CVSscan uses a visual mapping similar to History Flow [25] to show changes in code over time. CVSgrab uses stacked, colored bars to show changes occurring in files over time. Both of these mappings can assign a hue for each developer. Like the examples above, these tools are useful for software developers but we are designing for casual viewers.

CodeSaw [9] and StarGate [17] aesthetically visualize the relationship between code contributions and developer communication and focus on the developers. CodeSaw uses stacked timelines with contributions on top and communication on the bottom of the axes. In contrast, StarGate has a circular layout with the communication network in the center and the repository on the outside. Both of these projects feature a static overview of the data while code_swarm's emphasis is on animation.

The Bloom Diagram [12] offers perhaps the most qualitative view of developers. Software developers are classified as coders, commenters, or some degree of both, using analyzed data from source code and mailing lists. They are then arranged in a circle in two rings, based on what role they have. It is then easy to see which people code the most, comment the most, and so on. Code_swarm uses a different approach, which is to infer roles by the file types people commit. And in contrast to all of these projects, it arranges the data in an unstructured way.

There are numerous other systems that use data from software repositories (e.g. [1][3], [18], [32]). However, their goal is distinct from code_swarm's, in that they display quantitative information for the purpose of project analysis. Code_swarm was not designed to be a "re-placement" or "competition" for these systems. Though they represent an interesting area of software visualization, a survey of these systems is beyond the scope of this paper.

2.2 Generative Art

Inspiration for the aesthetics of code_swarm came from artists who use Processing to create amazing computer-generated artwork. Jared Tarbell writes small programs that generate intricate patterns [24]. For example, his "Substrate" sketch draws a city-like pattern from a simple rule of perpendicular growth. Robert Hodgin generates videos, often using music as the data source [11]. His "Solar, with Lyrics" and "Weird Fishes: Arpeggi" are examples of how he uses songs to drive particle engines. Glenn Marshall also uses music as data in his "Zeno Music Visualiser," from sources such as Swan Lake and Radiohead [14].

The works of these three artists are given as examples of aesthetic data visualization. But they are not, by any means, the only ones who have proved that beautiful animations could be made using Processing. Their algorithms either followed simple physical rules or were driven by external data like music. Though the purpose of their work is to entertain rather than inform, those two properties form the basis of the next related area: organic information visualization.

2.3 Organic Information Visualization

In his master's thesis, Ben Fry makes a case for using organic simulations for visualizing complex, dynamic data [8]. That is, rather than mold the data into a static structure — the prevalent method in current information visualization practice, such as treemaps, parallel coordinates, etc. — more freedom can be given to the entities representing the data, such that they can interact and form emergent structures. This idea makes sense when dealing with multivariate, frequently changing data, especially when qualitiative knowledge about the data is desired. Such is the case of code_swarm and visualizing software project histories from repository logs.

An application by Fry for his thesis is Anemone. It is an organic visualization of website traffic. As visitors explore a site, their paths are traced out in a tree structure representing the site hierarchy. Each page is a node in the tree, which become larger as they are visited more. All this activity has a certain lifespan: visited pages first appear, grow larger, and decay if they are left alone. This cycle of birth, growth and decay is also a part of code_swarm's design. But while Anemone visualizes site structure, code_swarm does not use directory structure.

Two other organic visualizations of website traffic, by Erlend Simonsen, are glTail and glTrail [22]. glTail shows, in real-time, requests to a webserver as different-sized circles pouring forth from the left- and right-hand sides of the screen. Circle size indicates the byte size of the request and color represents which site initiated the request. The circles fall to the bottom of the screen, where they collide, pool, and drain. glTrail, on the other hand, is similar to Anemone because it shows the website structure as a dynamic graph. A node in the graph is a webpage, whose area is proportional to its popularity. They are connected by links and move freely in a physical simulation. Visual similarity between glTrail and code_swarm can be seen and, though we do not claim novelty of our technique, the two applications were developed separately. Just the same, code_swarm focuses on the human actors while glTrail focuses on the artifacts being acted upon.

We Feel Fine [10], by Harris and Kamvar, extracts sentences containing "feelings" from blogs, processes them and displays the results in an aesthetic way. One movement of their work, Madness, has organic qualities to it. Each of the hundreds of feelings is represented as a particle colored to match its mood. The particles fly around the open space, bumping into each other and swarming around the cursor. The movement of particles is purely aesthetic, in contrast to code_swarm.

Andrew Vande Moere's information flocking boids technique [15] is related to organic infovis as it uses simulated, autonomous objects [21] to represent dynamic data. However, the result is a static image rather than an animation, so it loses some of the liveliness associated with animated processes.


Designing Code Swarm

A high-level description of our task is: To create a visualization of software project histories that will appeal to developers and casual viewers. There are many factors in that one sentence that must be considered. What follows is an account of the data requirements, design decisions, and implementation details that arose during the project. We also discuss how the resulting videos were prepared for consumption by the masses.

3.1 Software Project Data

We wanted code_swarm to be applicable to all open source software projects. Therefore, we chose to use data from a source they all have in common: source control repositories. Project history data is located in the repository system's logs.During normal use, developers make changes to the source code or other files and commit them to the central repository. The system keeps a log of this activity which is easily accessible to people working on the project. The open source projects that we profile use CVS or Subversion, and it is from these systems that we obtain our information. Specifically, we extract each revision event (i.e. file commit) over the project's lifetime. Within those events, there is data on:

  • the time at which they were committed,

  • Which files were commited,

  • and which person committed them.

By studying these logs, one can reconstruct the timeline of the project. However, it is difficult to sense patterns and characterize the project activity from a purely textual view. Visualization aids tremendously in this task.

The four well-known open source projects that we profile are Apache, Eclipse, PostgreSQL and Python. The backgrounds of these projects as well as their code_swarm results are discussed in section 4.1. The data used to create the four project videos was obtained from CVS and Subversion revision control systems. They are two popular systems, but dozens of others exist. In section 4.3 we note how the open source community has extended code_swarm's data import capabilities to include other revision control systems, like Perforce and Mercurial. There is even support for non-software data like MediaWiki history.

3.2 Design Decisions

Aesthetics: The look of our visualization was a big factor in its design. We wanted to engage the viewer from the beginning and hold their attention for as long as possible. Organic information design was chosen as the overall paradigm to give the impression that the software projects are living, breathing organisms.

Color: Our use of color was deliberate and refined through experimental iteration. We chose to map hue to file type (e.g. source code, document, etc.), as it is a nominal variable. Less desireable alternatives which were rejected are saturation, shape and texture. However, hue must be carefully used because it is difficult to keep track of more than five to ten specific ones [29]. We avoided this pitfall by limiting the number of separate hues to four or less. This reduces cognitive load and allows the user to concentrate on other parts of the visualization. The transparency of a particle indicates how recently it was active. When a commit happens, the developer and affected files will be opaque. But they will fade over time, unless another commit affects them. Saturation was used in a binary sense: A file sprite would lose saturation (i.e. flash towards white) when it is committed. Otherwise it would appear its normal color.

Medium: We chose video as the medium, rather than an interactive interface, to be easily accessible to casual viewers. We felt that being able to interact with the elements on the screen would add little to understanding and much to confusion. The video medium also has a consistent timeline which fits people's intuition of a project timeline.

Video Length: Since we are depicting compressed time in a video, we had the freedom to choose the time mapping. An observation guided us to a practical conclusion: Too short a video and nothing could be learned; too long and it would be difficult for anyone to finish. Thus there should be a medium length that can provide a good balance. We used popular music videos (usually around four to five minutes in length) as our model because they have demonstrated for decades that they are able to hold people's attention. Since software projects have different timespans, each project's time mapping can be calculated so that the resulting video comes close to four or five minutes.

Performance: Because the visualization is non-interactive, time performance was not a concern. We therefore chose to render each frame in the animation to an image file and encode them as a video when the process was finished. Rendering a project's entire history took time on the order of hours to finish. Ultimately, the particular physical simulator we used proved to be the performance bottleneck. But after open sourcing code_swarm, faster algorithms were contributed which sped up the offline rendering time.

Data Impression: Our decision to not draw lines — visually connecting developers to their files — was a deliberate one. Organic information visualization is an inherently fuzzy method of data display: there are no exact quantities or relationships being shown. However, it is an acceptable level of ambiguity because our goal is not hard analytics. As long as people get the impression of what is happening in the project, exact quantities are not needed. We therefore did not draw the edges because they contribute to visual clutter while not significanly increasing data clarity for casual viewers. That said, we left the edge-drawing feature in our open source code, so that others may activate it if they wish. A notable video which does have edges in it is the Obama WikiSwarm (Fig. 5, middle).

3.3 Initial Prototypes

In our first experimental applications, we only considered the relationship between files in the repository. Each file was represented as a movable node, such as a circle or rectangle, whose full name is displayed within. Connections between files were inferred from simultaneous commits. That is, if two or more files were committed at the same time, they are considered linked in the physical simulation. As time moves forward, the committed files appear and linked files are pulled towards each other.

The results were aesthetically pleasing, as the file bubbles popped into existence, danced around each other and slowly faded away. However, the visualization lacked a certain feeling that it meant anything to the end user. To put it plainly, it was not interesting. We quickly realized that the humanity of the software development process was missing because we did not include the people involved with the project. As a corollary, the experiment taught us that, in this domain the files themselves mattered less than the people. These observations subsequently influenced our design decisions.

We also noticed that more historical context would be needed besides the date display. Thus we added a simple histogram of previous commit sizes to the bottom-left of the display area. However, the histogram is not sufficient on its own; it shows commits but not collaboration.

3.4 Implementation

We used the Processing programming language and environment to build code_swarm. It was chosen for its capacity for rapid graphics prototyping and built-in animation capability. In this regard, Processing complements organic information design quite well. Processing also has a proven track record with generative artists who have used it to produce impressive works (Section 2.2).

To create a system with developers and files moving around in an organic-feeling way, we used a simple spring embedder algorithm. In the language of graph layout, developers and files are nodes in a dynamic bipartite graph. This structure was also used by Weißgerber et al., though their visualization approach used monthly graph snapshots rather than animation [32]. When a developer commits a file, an edge is created between developer and file nodes and they attract each other. As time passes, the attractive force of the edge weakens and they move apart. Files also repulse other files so that there is not so much overlap between unrelated ones. Developers, however, neither attract nor repel other developers. This means that they are positioned by the files alone. Therefore, two spatially close developers work on the same files. See Fig. 1 for a diagram of the code_swarm layout.

We anticipated that we would need to experiment with the code_swarm parameters in order to create a decent video. There were visual settings, like file colors, fonts and blending; physics model settings, like repulsive and attractive forces and decay rate; and animation settings like timestep length per video frame. We designed the software so that these parameters can be easily changed.

Figure 1
Fig. 1. A simplified diagram of the code_swarm layout. (A) Colored labels indicate the file type. (B) Document files, as blue circles, have been committed by "documenter." The dark color means they were committed close to the current time. (C) Source code files, as red circles, have been committed by "programmer" and "helper." Some circles are lighter, which means they were committed earlier than the darker ones. (D) A histogram tracks the amount and type of commits over time, from right (newer) to left (older). (E) The date display provides temporal context during the animation.

3.5 Ready For the Masses

We knew the ouput videos — though dynamic — would not be able to hold people's attention: Particles representing people and files move around but there is no context for what is happening. Therefore, we added postproduction finishing touches to enhance the viewing experience. Each video was given subtitles to explain the visualization elements, such as the meaning of color and size. Project information, such as background, major developers and significant events, was also given textually. Finally, an upbeat soundtrack was added to keep viewers' attention.

After the videos were completed, they needed to be accessible to the public. We understood that the user's viewing experience is a critical factor in the success of a visualization and we wanted to maximize the experience and the distribution. A few years ago, before the advent of streaming video services, we would have hosted the videos on a university server, available as a non-streaming download. That would be detrimental to the viewing experience, as the videos would take several minutes to download before they could be watched. We chose Vimeo as the video streaming service due to their high resolution playback, because viewers need to be able to discern the names of developers inside code_swarm. As of this writing, YouTube recently became another option for high quality video hosting.


Results and User Response

To study the validity and user response to code_swarm, we use three methods. First, we analyze four project videos to see whether our technique gives an accurate portrayal of each project's history. Next, we examine and characterize viewer comments by collecting them from various sources on the web. Finally, we take a look at how our code_swarm application has been used and adapted by the software developer community.

4.1 Analyzing the Videos

To address the question of whether code_swarm is a faithful visualization of a project's history, we analyze and comment on each of the videos.

When run against four well known open source projects, code_swarm was able to show different, characteristic development patterns in each. They range from Python's classic one-person startup to Eclipse's fast-paced jumpoff from an established product. In the following sections we will analyze the videos and point out each of their distinctions. We invite readers to view each video at [16] before or while reading this section. Much of the information about the project development comes from Jack Repenning's CollabNet blog [20], used with permission.

Figure 2
Fig. 2. Frames from the Python code_swarm Top: November, 1992. Most of the commits are done by Guido van Rossum Bottom: November, 2000. The popularity of Python takes off and more people join the project

4.1.1 Python

The Python scripting language has exploded in popularity in recent years. It was created by Guido van Rossum in the late 1980's and released to the public in 1991. From the beginning of the video, we see Guido mostly working alone on the code. A year later he is joined by two developers, but they stay on the periphery of the swarm, indicating that they work on small, specialized sections of the code (Fig. 2, top). In a few more years the project is joined by a few more developers, including Jack Jansen, who created MacPython. His contribution pattern stands out because his sphere of influence periodically converges with and diverges from Guido's. Fred Drake also appears and commits mostly blue document files. Indeed, he will become the lead documenter of Python. This pattern of Guido being the clear central developer with specialists on the periphery continues. In the year 2000, the popularity of Python takes off (Fig. 2, bottom). We see many new developers coming in as the project activity increases dramatically. The project stays busy for the rest of the video.

4.1.2 Eclipse

Eclipse is a popular software development environment. It was initially developed by IBM as a closed source project, but was released as open source in 2001. The Eclipse Foundation was formed in 2004 as a cooperative between technology companies, including IBM, to serve the financial and legal needs of the project. This industry support is apparent in the code_swarm video (Fig. 3). From the video's beginning, we see a flurry of activity: many people working on many files at a nearly constant pace. Those people already had experience working on Eclipse as part of their job.

Figure 3
Fig. 3. Frame from the Eclipse code_swarm video. January, 2001. There are many developers working on many files. Red files are source code, teal files are images. The histogram shows periodic weekend breaks and a longer break a few weeks earlier

We also see that, even amid the chaos, the Eclipse project is made of many components. There are many developers clustering around the center, yet they each have their own set of files. In other words, there is not one large set of files being worked on by everybody, but many small sets of files being worked on by individuals. This can be attributed to Eclipse's Rich Client Platform which facillitates modular design.

Observing the commit histogram, we see something not apparent in the Python video: weekends and holidays. Weekends are shown as the periodic gaps in the histogram occurring every seven days. The holidays can be seen at the end of December throughout the video. These breaks further reinforce the idea that Eclipse is developed with corporate sponsorship.

4.1.3 Apache HTTPD

Next we look at the popular webserver, Apache HTTPD (henceforth, simply referred to as Apache). Our video shows the development of version 2 of Apache. Version 1 had been developed several years earlier by Rob McCool, but the software needed to be rewritten to accommodate numerous features. Therefore, the Apache group planned for the development of version 2.0 in 1996. This is where the CVS commit history for the project and our video begins.

At the start of the video, the striking thing is that all the active files are blue, indicating documentation (Fig. 4, top). There are no code commits for several years while the group discusses the new architecture. In July of 1999, once the design specifications are ready, the first pieces of source code are committed. The commits from then on are both code and documents (Fig. 4, bottom). The core developers work together, in the middle, while some specialists, like documenters and module developers, work at the periphery of the swarm. Towards the end of the video we see the documenters overtake the developers in activity, then settle down again.

4.1.4 PostgreSQL

Finally, the PostgreSQL video shows a pattern of highly interconnected components and developers. This relational database project started at the University of California, Berkeley under Michael Stone-braker in the 1980's, and remained more or less in closed academic development until 1996. It was then made open source with version 6.0.

Figure 4
Fig. 4. Frames from the Apache code_swarm video Top: May, 1997. Only documents (in blue) are committed for nearly two years Bottom: December, 1999. The design phase is over and some of the first code is committed (in red/yellow

As the video shows, the group of contributors is quite stable with only a few consistent developers. This is unlike Python, whose participants grew considerably over time, and Eclipse, whose developer list is consistently large. We believe the reason PostgreSQL has a small group of consistent developers is the amount of specialized knowledge needed to maintain relational database software as well as the interdependence of its components. The interdependence of components is apparent in the video as the core developers often overlap each other by committing the same group of files.

4.2 Response From the Public

To get a feel for the public's response to code_swarm, we followed the same approach as [30] and [31]. That is, we examined spontaneous user feedback found on the web. We looked at the comments applied directly to the code_swarm videos on Vimeo as well as searched the internet for blog posts and news articles relating to code_swarm, of which there were surprisingly many. For instance, a Google search for "codeswarm" in March, 2009, returns about 36,000 results. These web comments should not be considered a scientific sample, but they are used to illustrate the possibilities of code_swarm's use and its reception by the public.

4.2.1 Compliments

The vast majority of the comments approved of code_swarm in some way. Approval ranged from simple words like "cool" and "wow" to personal insights about human effort and cooperation. Of the positive sentiments we found, we classify the common themes as amazement, understanding and sharing.

Amazement: There was certainly a "wow factor" represented in people's comments.

The movie for the Eclipse project is incredibly beautiful. It's amazing to see so many people working so hard and so consistently on something1

Some responses were unexpectedly emotional:

almost made me cry, it's beautiful2

The geek in me really rejoiced at the tipping point [of Python's popularity]. Visualizing that amount of human endeavor, coordination and achievement gets me kinda misty… sniff3

The beauty they saw was already in the data, buried within the un-exceptional logs. code_swarm brought that beauty to the surface.

Understanding Development: On the more pragmatic side, many comments demonstrated how code_swarm enlightened people to the process of software development.

I watched them and gained a better understanding about the [sheer] effort people make with their projects. Watching the days tick over and seeing no rest in activity was quite an eye opener. Also, see that often the success of open-source projects are the result of a very few people making a very big [effort]3

There were project-specific observations, as in this comment about Apache:

The Apache project started with a bunch of documentation before they started coding. I also like how they evolved into a group of coders and documenters with several doing both4

But people seemed to be most impressed with Guido's work on Python, as in these two examples:

It shows in a really straightforward way how at the beginning Guido developed Python completely alone, how different people tend to work in different kinds of files, how the prominence of developers changes or how the project grew spectacularly in the year 2000. All these facts are completely invisible accessing to the raw commit history!5

This has so many good lessons in it. First off don't expect anything to take off without years of hard work. Second it takes about 10 years for things to really catch on in software, standards, technology. Third, many times it is only the creator or a few people keeping large languages and frameworks alive. Fourth, when you get hired at Google you are officially a rockstar if they use the language you created 10 years prior. Go Guido go…this visualization is intensely inspiring…1

These comments show that people gained insight when they viewed the videos. This leads into the next sentiment: sharing their experience with others.

Sharing: Some found code_swarm so compelling that they wanted to show it to other people. One person found it valuable as an educational tool:

I will be using this in my Introduction to Python Class. What is python? Let me show you. Awesome work6

Two others, presumably in the software industry, thought code_swarm would make for a good motivational presentation:

I think this would be a great idea to show the team on release dates!7

I would love to get this … Would be a great internal present to send to the engineering team1

4.2.2 Criticism

Not all comments were positive, and this section highlights the critiques of code_swarm. Some were displeased with our choice to map particle size to the number of file commits:

A nitpick: in the video, you say that files grow in size every time they are committed. However, in general, it is possible for a file to shrink; code or text may have been removed7

We were able to respond directly to this comment with our reasoning on the same page:

I had that thought as well. I wanted the particles to grow as a measure of progress and popularity among developers. When you consider it that way, then a reduction in file size may also mean a cleanup towards legibility or refactoring. It's not an exact mapping, but I think it works7

In the next comment, the metric we used for developer activity was called into question:

Lots of commits isn't really a measure of developer productivity or worth. … More seasoned programmers will tend to make fewer, but larger commits8

While we certainly did not intend for the commit frequency of a developer to imply their worth, some hold that interpretation. Another person extended this line of thinking to postulate that code_swarm will have a negative effect on developer behavior:

Of course, it's flashy and cool, but I worry that this will only encourage people to make more commits instead of actually using their brains8

They are referring to the observer effect: Knowing that one is being observed affects one's behavior. This scenario may happen in the short term by an unscrupulous person. But with open source's philosophy of having many eyes review the source code, the person is likely to be noticed and dealt with. Note that people can already observe developer behavior through the repository logs and software hosting sites. For example, software collaboration sites and show a bar chart of a developer's commits to a project over time. It is the same data with a different visual mapping. In general, information visualization involving human behavior is subject to the observer effect, and this could be an area of future study.

As to whether file commits are a good measure of developer activity, someone responded on the same discussion thread:

Software engineering as a discipline has been working for decades to come up with a heuristic to evaluate programmer productivity, and we're still nowhere close, although there are literally hundreds of formulas in use8

On the visual aspect of code_swarm, one person found the overlapping developer names too cluttered:

One thing I didn't like was that the more important collaborators were impossible to read by the end, while the peripheral ones were easier to read, because they weren't jumbled up in the middle1

We also considered this problem when testing code_swarm. In the end, we used a uniform font size for developer names. Enlarging the names of the frequent committers cluttered the visualization even more and also detracted from the sense of team effort. It could also lead to false impressions, as in the case of an automated build account which updates files nightly.

4.2.3 What People Wanted

There was no shortage of requests to add features and produce more project videos. But by far the most requested action was to make the code open source.

I'd really like you to make this an open source project. I think it can greatly help a team to see how well modularized or loosely coupled is a project. It's also very fun. :-)7

And some even offered their help:

I don't have the authority to speak on behalf of all of us [Slashdotters], but I will anyway. Give it to us and we'll clean it up for you8

We also received dozens of emails from people of the same nature, asking for the code or offering to contribute. When we first published the videos, we did not anticipate that the call to release the source code would come so quickly. This response reassured us that open sourcing code_swarm would be a positive use of resources. We are happy to report that, after a week spent cleaning up the prototype code, we released it on June 16, 2008.

4.3 Adoption by the Community

After code_swarm was released as open source, contributors added support for most of the major version control systems (such as SVN, CVS, Git, Mercurial and Perforce), and some less common ones. This meant that users could create their own visualizations from their own software project repositories.

Within a week of the source code release, user-created code_swarm videos were appearing on the web. Some of the first projects out were Blender9, Meneame10, and Django11.

And the production of user generated videos continued. Every so often, the posting of a famous project's video would introduce code_swarm to a new segment of the software developer community, those developers would make their own videos, and the pace of video releases would increase for a short time. Some notable code_swarms of open source projects are Subversion12, SciPy13, and Plone14.

A few closed source projects have used code_swarm and shared their results online. They include: Vimeo15, the video sharing web-site that we used to host our results; Flickr16, a popular photo sharing site (that also hosts videos); and LittleBigPlanet17, a highly anticipated video game for the Playstation 3. The LittleBigPlanet video is notable in that it shows large-scale commercial development where many of the developers are game artists and designers (Fig. 5).

A few people have even used different types of data sources as input to code_swarm. A Flickr developer used their bug tracker data to show issue reports being created, accepted and handled [video no longer publicly available]. Wanting to show how online communities are formed, Jamie Wilkinson created a Wikipedia history parser to turn article edits into code_swarm commit events. He then posted a video of edits to the article on Barack Obama, which leads up to the historic November, 2008 election18.

People have also extended code_swarm with interesting features. We've already mentioned Jamie Wilkinson's Wikiswarm parser, which changes a Wikipedia page history into a format readable by code_swarm19. Arne Babenhauserheide wrote "Shared Codeswarm" to take data from multiple projects and visualize them in the same video space20. Using the software, he created a video containing the open source version control projects Mercurial, Git and Bazaar. It shows that, even though they are competing projects, there is a lot of cross-pollination happening between them. Finally, Peter Burns added many enhancements in his fork of the code_swarm project, some of which were folded into the main one21.


Conclusion and Future Work

We believe that a handful of design decisions led to the success of code_swarm. These decisions all affect the accessibility of the videos in some way. First, we kept the videos relatively short, around five minutes in length. Second, we used a soundtrack to enhance the viewing experience. Third, we showed the names of individual developers, so that people involved with the project, and those who know them, could feel they were part of the visualization. Fourth, we used a video streaming service to host the videos, so that viewers could have immediate access. Finally, we made the project open source so that the software community would be able to use it to create their own visualizations.

We strongly feel that making code_swarm open source was a good decision with implications for the future of information visualization practice. Our system was improved greatly from the generousity of the software community. And in return, the software community could participate in programming visualization techniques. In general it would benefit everyone if more information visualization software was open source.

Code_swarm also has positive implications for the future of organic information visualization. The comments in Section 4.2 showed enthusiasm, interest and engagement by the public. This means organic design has a place in visualization and we may see more systems using the technique in the future.

For future work, we do not yet have a scientific answer to how long a video should last. We believe they should be as long as or shorter than popular music videos (about four minutes), since they are able to hold the attention of people. But there are other factors to consider, such as the involvement of the viewer in the data and the experience they have with software development. For example, an active Python developer would be more engaged in a Python code_swarm and could watch it for a longer period than a lay person. So what is the limit of attention paid to videos showing software evolution? And how much extra time, if any, does pretty visuals and background music gain?

A feature that people have asked for, but would be difficult to implement with the current code_swarm technique, is having a real-time dashboard for project events. An aesthetic organic visualization could be set up as an ambient display, either on a personal computer desktop or on a monitor in an office, that will show the state of the software repository and commit activity. With code_swarm's emphasis on casual information visualization this seems like a perfect fit. However, the commits that drive the animation are instantaneous events and in a real-time setting they come minutes apart, if not hours or days. Thus, using code_swarm as it is currently implemented would not be interesting to watch in real-time.

Finally, we would like to incorporate more of the data in software repositories to create a more comprehensive visualization. We deliberately did not visualize the organization of the project files. The common ways to visualize such a tree structure (e.g. tree diagrams and treemaps) we felt are too rigid and would not fit the organic aesthetic. There are also commit messages present in the logs that would give some context to what is happening in the project's evolution. We see this as a text stream visualization challenge to be researched in the future. Lastly, the evolving source code could somehow be incorporated into the visualization. This opens up many new avenues of data, such as the lines which were changed, the names of functions, code comments, and code complexity metrics. Perhaps some of this data can be squeezed into the current visual paradigm of code_swarm. But it is more likely — and more exciting — that a new system could be designed, building on the lessons learned from this experiment in organic software visualization.

Figure 5
Fig. 5. Frames from code_swarm videos made by others, examples of community adoption Left: LittleBigBang: The Evolution of LittleBigPlanet17 by Alex Evans. This video gives a glimpse at closed-source video game development Middle: Obama WikiSwarm18 by Jamie Wilkinson. Visualizes edits on Wikipedia, a non-software data source Right: Git vs. Mercurial vs. Bazaar20 by Arne Babenhauserheide. It visualizes three similar software projects in the same space. Interestingly, there is a lot of cross-pollination of developers. Git = red, Mercurial = blue, Bazaar = green.


We would like to thank Premkumar Devanbu, Christian Bird and Alex Gourley for preparing the data used in the initial code_swarm experiments. Also, thanks to the contributors to the open source code_swarm project, including Chris Galvan, Desmond Daignault, Sebastien Rombauts, Arjen Wiersma, Peter Burns, and patch submitters too numerous to mention. And finally, thanks to all the software enthusiasts who have created their own code_swarms and shared them with the world.


1. Clustering software artifacts based on frequent common changes.

D. Beyer and A. Noack.

In International Workshop on Program Comprehension, pages 259–268. IEEE, 2005.

2. A system for graph-based visualization of the evolution of software.

C. Collberg, S. Kobourov, J. Nagra, J. Pitts and K. Wampler

In SOFTVIS, pages 77–86. ACM, 2003.

3. The evolution radar: visualizing integrated logical coupling information.

M. D'Ambros, M. Lanza and M. Lungu.

In Mining Software Repositories, pages 26–32. ACM, 2006.

4. Software Visualization: Visualizing the Structure, Behaviour, and Evolution of Software.

S. Diehl.

Springer-Verlag New York, Inc., 2007.

5. Visualizing software changes.

S. G. Eick, T. L. Graves, A. F. Karr, A. Mockus and P. Schuster.

Transactions on Software Engineering, 28 (4): 396– 412, 2002.

6. Jr. Seesoft-a tool for visualizing line oriented software statistics.

S. G. Eick, J. L. Steffen and E. E. Sumner

Transactions on Software Engineering, 18 (11): 957–968, 1992.

7. Unifying artifacts and activities in a visual tool for distributed software development teams.

J. Froehlich and P. Dourish.

In ICSE, pages 387–396. IEEE, 2004.

8. Organic information design. Master's thesis, School of Architecture and Planning, Massachusetts Institute of Technology,

B. J. Fry.


9. CodeSaw: A social visualization of distributed software development.

E. Gilbert and K. Karahalios.

INTERACT: Human-Computer Interaction, pages 303–316, 2007.

10. We feel fine.

J. Harris and S. Kamvar

11. Flight404.

R. Hodgin

12. Growing bloom: design of a visualization of project evolution.

B. Kerr, L.-T. Cheng and T. Sweeney

In CHI extended abstracts, pages 93–98. ACM, 2006.

13. The visual code navigator: An interactive toolset for source code investigation.

G. Lommerse, F. Nossin, L. Voinea and A. Telea

In Symposium on Information Visualization, pages 24–31. IEEE, 2005.

14. Personal blog.

G. Marshall

15. Time-varying data visualization using information flocking boids.

A. V. Moere

In Symposium on Information Visualization, pages 97–104. IEEE, 2004.

16. An experiment in organic software visualization.

M. Ogawa


17. StarGate: A unified, interactive visualization of software projects.

M. Ogawa and K.-L. Ma

In Pacific Visualization Symposium, pages 191–198. IEEE, 2008.

18. What dynamic network metrics can tell us about developer roles.

M. Pohl and S. Diehl.

In CHASE, pages 81–84. ACM, 2008.

19. Casual information visualization: Depictions of data in everyday life.

Z. Pousman, J. Stasko and M. Mateas

Transactions on Visualization and Computer Graphics, 13 (6): 1145–1152, 2007.

20. Contribution patterns in open source. Blog entry.

J. Repenning.

21. Flocks, herds and schools: A distributed behavioral model.

C. W. Reynolds

In SIGGRAPH, pages 25–34. ACM, 1987.

22. glTail and glTrail.

E. Simonsen

23. On the use of visualization to support awareness of human activities in software development: a survey and a framework.

M.-A. D. Storey, D. Čubranić and D. M. German

In SOFTVIS, pages 193–202. ACM, 2005.

24. Gallery of computation.

J. Tarbell

25. Studying cooperation and conflict between authors with history flow visualizations.

F. B. Viégas, M. Wattenberg and K. Dave.

In CHI, pages 575–582. ACM, 2004.

26. Visual assessment of software evolution.

L. Voinea, J. Lukkien and A. Telea.

Sci. Comput. Program., 65 (3): 222–248, 2007.

27. CVSgrab: Mining the history of large software projects.

L. Voinea and A. Telea

In EuroVis, pages 187–194, 2006.

28. CVSscan: visualization of code evolution.

L. Voinea, A. Telea and J. J. van Wijk

In SOFTVIS, pages 47–56. ACM, 2005.

29. Information Visualization: Perception for Design.

C. Ware

Morgan Kaufmann Publishers Inc., 2004.

30. Baby names, visualization, and social data analysis.

M. Wattenberg

In Symposium on Information Visualization, page 1. IEEE, 2005.

31. The word tree, an interactive visual concordance.

M. Wattenberg and F. B. Viégas

Transactions on Visualization and Computer Graphics, 14 (6): 1221–1228, 2008.

32. Visual data mining in software archives to detect how developers work together.

P. Weißgerber, M. Pohl and M. Burch

In Mining Software Repositories, pages 9–16. IEEE, 2007.


No Photo Available

Michael Ogawa

Student Member, IEEE
No Bio Available
No Photo Available

Kwan-Liu Ma

Senior Member, IEEE
No Bio Available

Cited by

No Citations Available


IEEE Keywords

No Keywords Available

More Keywords

No Keywords Available


No Corrections


No Content Available

Indexed by Inspec

© Copyright 2011 IEEE – All Rights Reserved