On Program Comprehension

by Benjamin Kaminski — 27 February 2023
Topics: Interview

Could you introduce yourself and your research area?

Sure, I am Sven Apel ;-), Professor of Computer Science at Saarland University, leading the Software Engineering group. Generally, I am interested in all aspects related to programming and software engineering. In the past, I looked a lot into software product lines and configurable software systems, software project analytics, and the human factor in software engineering. To give you an idea, we are interested in how one can know that all of the myriads of valid variants of the Linux kernel are actually correct, or which of them have acceptable performance for a certain setting? Are there indicators for learning whether such large-scale open-source projects, with thousands of developers and without a mandated process, are on track and healthy? In general, what factors help programmers understand programs in the small and in the large (e.g., programming mechanisms, education, and so on)?

What are the most exciting current advances in your field?

There are quite some advances in the area of software engineering. Let me mention some that are close to my research and, more importantly, that are at a methodological level, meaning not about results, but methods and standards for obtaining sound results.

First, in the past decade, we have seen quite a drastic shift towards establishing and enforcing rigorous methodological standards in software engineering research. Given that software engineering relies to a good extent on empirical methods for studying programmers and how they develop software, this movement has been long overdue. This particularly includes quantitative methods (e.g., mining software repositories, sound statistical analysis) and qualitative methods (e.g., think-aloud protocols, interview studies).

Second, one trend in software engineering research that I very much appreciate is the use of methods from other disciplines to enrich our methodological arsenal. Most notably, researchers started applying social network analysis techniques to obtain a better understanding of social aspects of software projects, machine learning and optimization techniques for automating and analyzing various aspects of software systems, and cognitive modeling and neuroimaging methods to better understand one of our main study subjects: the software developer.

Finally, also the software engineering community is not unaffected by the recent advances in machine learning. In my own research on modeling software performance (with Norbert Siegmund, Professor at Leipzig University), we have been using machine learning techniques very successfully for over a decade, but the impressive developments of recent months and years will entirely transform the field.

Can you tell us a bit about your recently acquired ERC Advanced Grant?

In 2022, I was awarded with an ERC Advanced Grant. As you can imagine, this really changed my professional life. On the one hand, this is a great honor, but more importantly, this gives us the freedom to pursue our admittedly very fundamental research with the necessary rigor and diligence. The project, called Brains On Code, aims at better understanding the programmer and their ability to understand programs. The rationale is that research on program comprehension has a fundamental limitation: Program comprehension is a cognitive process that cannot be directly observed, which leaves lots of room for misinterpretation, uncertainty, and confounders. In Brains On Code, we are developing a neuroscientific foundation of program comprehension. Instead of merely observing whether there is a difference regarding program comprehension (e.g., between two programming methods), we aim at precisely and reliably determining the key factors that cause the difference. For this purpose, we leverage established methods from cognitive neuroscience (e.g., fMRI and EEG) to obtain insights into the underlying processes and influential factors of program comprehension. This way, Brains On Code lays the foundations of measuring and modeling program comprehension and offers substantial feedback for programming methodology, language design, and education. Answering long-standing foundational questions such as “How can we reliably measure program comprehension?”, “What makes a program difficult to understand?”, and “What skills should programmers have?” comes into reach.

How did you come up with that topic?

During my PhD studies, I was quite into novel programming languages and mechanisms (mostly for modularity and composition). At the time, researchers were routinely proposing – almost inflationary – novel programming languages and language mechanisms. I noticed quite early that often people motivated or justified their work with the human factor, stating, for example, that this or that mechanism or language was easier to understand or to use for a programmer. While, in principle, this line of reasoning is valid, there was almost never any evidence for those claims. In the late 2000s, we started leveraging, applying, and promoting empirical methods for our research. While gaining lots of insights, at some point we realized that just asking or observing programmers while they are programming is not sufficient to really pin down the factors that influence program comprehension. Our interests and connection to neuroscience led us to give neuroimaging methods a try. In particular, Janet Siegmund (today Professor at TU Chemnitz), André Brechmann (Leibniz Institute for Neurobiology), as well as further students and colleagues started a very fruitful collaboration that continues to this day (see our recent CACM article for a historical perspective).

Apart from your ERC project, what do you expect will be the next big thing/challenge in your field?

Apart from individual research challenges, the biggest overarching challenge of the software engineering community is to claim its stakes in the AI revolution that is rolling over research and society. We, as a community, should be much more confident in what we can bring to the table. Rather than being afraid that the activity of programming and developing software may soon be obsolete, we should bring in our methods and experience to shape the future. One thing is clear: Systems and applications that leverage artificial intelligence and machine learning are still systems (not models) that need to fulfill quality requirements and that are created and used by human beings. Especially, the latter point is then related pretty much again to my ERC project. It is people who are developing and interacting with systems, so we need to help them understand and gain trust in the systems’ reliability and fairness. Regardless of how powerful AI systems will become, unless we have human-like intelligence, there will always be humans involved in designing, maintaining, overseeing, and testing software systems of a certain complexity. Maybe they won’t use our programming languages from today, but they will need to understand what the system is supposed to do and what it actually does.

Following up on AI, can you elaborate a bit more on the potential impact that machine learning and artificial intelligence will have on software engineering?

Much like with most areas, artificial intelligence, in general, and machine learning, in particular, will have a transformative effect on the research landscape and practice of software engineering. As said previously, we have seen successful and remarkable applications of machine learning on software engineering questions and problems in the past. However, very recent developments in generative language models, such as ChatGPT and GitHub Copilot, will be game changers in how we think about programming and software engineering. Personally, I would not go as far as stating that programming will become obsolete soon, but the way we program (or tell computers what to do) will change. Besides the general problem that all of machine learning faces, namely to what extent the solutions (in our case, programs) are correct and trustworthy, there are some challenges that require established software engineering wisdom and folklore. The key observation is that humans use and interact with applications and not models. That is, just learning a model (e.g., a classifier) is only a small part of the story. For example, already today, Baidu’s self-driving car system Apollo contains 18 machine-learned components, integrated with a large number of classical software components (most control logic is handwritten). So, techniques of testing and verifying neural networks (e.g., for robustness) are highly appreciated, but by themselves insufficient. We need to take on a whole-system perspective as software engineers usually do! As a consequence, the design problem shifts from individual models and components to entire systems (with all kinds of interesting implications such as a revival of the good old feature interaction problem). What does correctness even mean in a setting where key components do not even have a precise specification (e.g., an image classifier) and where uncertainty is your constant companion? My colleague and friend Christian Kästner (CMU) has a great lecture and material repository on these and further questions.

Finally, who would you like to see interviewed on the ETAPS blog?

Christian Kästner (see reasons above).

And which paper in your field would you recommend everyone to read?

Not a paper, but Christian Kästner’s recent talk on rethinking the role of software engineering for machine learning.

See more posts

On Program Comprehension

Building Resilience in Academia

On the Theory of Learning Invariants

ETAPS 2025 in Hamilton, Canada