#186 - 4 Jan 2025

Northwestern's experience with success stories. Plus: Standups; Lockwood on life in industry; Best forking practices; Energy debugging; Genomic data cybersecurity and privacy from NIST; and Slinky for slurm on k8s

Happy New Year, everyone!

The newsletter fell off in the fall, as I was juggling too many other activities. I think everything’s back under control, and I’m looking forward to resuming the regular schedule throughout this new year!

Kicking off 2025, Christina Maimone, Associate Director of Research Data Services at Northwestern IT Research Computing and Data Services, agreed to do an asynchronous interview; Christina’s group has started writing up case studies for projects the group has worked on, and was willing to talk about it:

RCT: Tell us about your department, and what you wanted to achieve with putting together these case studies and testimonials.

Christina Maimone (CM): Our team is part of the central IT organization at the university that supports faculty, students, and staff across all schools and departments. We offer a broad range of services and support spanning HPC and other computing resources, data management and storage, data science, visualization, and statistics.

Last year we put together a portfolio with short descriptions of our data science, statistics, and visualization projects. That portfolio has been useful for showing researchers the range of our expertise in that area and to give them ideas for how we can collaborate with them, but it covers the research products resulting from just one of our service areas and doesn’t include any testimonials. With these new case studies, we wanted to highlight the broad range of our services and focus on the researchers’ perspectives and experiences. These cases allow us to showcase support that happens at all stages of the research process and that may be difficult to tie to a specific publication.

RCT: How did you choose people to interview? Was it hard to get people to agree to do it?

CM: We asked our team to brainstorm which researchers and projects they thought would be good to highlight, then we tried to pick examples from those candidates that spoke to different service areas and types of support we provide, as well as different research fields across the university. Many of the researchers we started with are people we’ve supported over several years or with multiple services, so we had strong relationships to work from. All of the researchers we’ve asked so far were happy to help. We tried to keep what we were asking them to do minimal, and we offered them options for responding via email or on a Zoom call. We had the members of our team who supported each researcher make the initial ask for their participation, and they then facilitated a connection to our communications team who followed up on the details.

RCT: Does having some published now make it easier to get new people to agree to do the interview?

CM: It hasn’t been difficult to get people to agree, but I do think it will be helpful to point to these published cases when reaching out to researchers moving forward. Personally, being able to share what we have so far makes me less anxious to ask additional people to participate because it’s easy to show them what we’re trying to achieve.

RCT: How was the process like of doing the interview and writeups? Anything harder or easier than you expected?

CM: When you’ve been closely involved with a project, it can be difficult to let go of the details and distill what might have been months of work into a clear and succinct story. It was a tremendous help to work with our IT communications team on these because they were able to pick out the key elements and put them together in a focused way. They spoke with the members of our team who supported each project to understand the context and our contributions, and then they reached out to the researchers to get their perspectives. They also developed a template that works well for a wide variety of cases. Having someone who wasn’t directly involved in providing the support draft for each case study helped ensure that the result would be relevant and understandable to a wide audience. They made it easy on us!

RCT: Those are some amazing pull quotes you have in those articles. Did you have to try hard to get people to say something so positive and quoteable?

CM: This is another area where involving our communications team in the process was really helpful. The sentiment and core of the quotes come straight from the researchers. But our communication team’s experience hearing what the researchers were saying and then capturing that in a form that also works well as a quote makes the end result successful and impactful.

RCT: I know it’s early days yet, but what has reaction been to them so far - any reactions from decision-makers or other researchers who have seen them?

CM: So far, they’ve been helpful when talking to new researchers about their needs because they show that not only can we provide the support they’re looking for, but also we have done similar work successfully in the past.

What I didn’t expect was how great these would be to read ourselves. The process of helping researchers can often be complex and take a while to resolve. Seeing the work brought together through a clear story and hearing about the impact from the researchers themselves is a good reminder of why we do what we do.

RCT: How are you planning to use these materials?

CM: While the primary audience is researchers who might use our services, we think they’ll be useful for helping our partners in other IT groups understand what we do as well. For those working in distributed school and departmental IT groups, these stories are a much more memorable way to share the range of services we offer than listings in a service catalog or bullet points on a slide. Hopefully they can see the faculty and students they work with on a daily basis in these cases. Similarly, our colleagues who run and support the infrastructure that our services rely on don’t interact directly with researchers on a regular basis. Their work is essential for the research that is done at Northwestern University, but they don’t always get to see the results of their work. This is one way we can share the impact of their work with them.

RCT: Given what you’ve learned so far, what are you thinking as follow ups?

CM: We have a few more case studies in process and then our next challenge is incorporating them with our service information. We focused on faculty-led projects for these full cases, but it would also be beneficial for students and postdocs to hear from their peers about how and why they can use our services. We have some feedback processes for workshops and consultations that can be adapted to support getting shorter testimonials or endorsements. One of the lessons is to make asking for feedback that we can share with others part of our routine—it feels easier to do now that we’ve started.

Thanks so much, Christina, for sharing your experience with the case studies. As you will all know, this is something I’m a big fan of! (Success stories are the best advocacy, #179, and a write-up on how to do them). It’s a great way to communicate the importance of the work we do, and I was particularly heartened by Christina’s comment about how great it was for the team to read these.

Are there things your group is doing that you’d like to share? Or would you like to like to talk about something you’re working on? Email me at jonathan@researchcomputingteams.org, or schedule a time to chat. I’d love to hear from you!

And now, on to the roundup:


Managing Individuals, Teams, and In Organizations

On the other side of the block over at Manager, Ph.D., in issue #176 I covered how Scientific Systems are Uncertain; while Human Systems are Ambiguous. Also covered in the roundup was:

  • Feedback is for behaviour change, not self-expression
  • Trust lets you see reality
  • Change is risky, and makes things worse before improving them
  • How to writing usefully

PS - I’ve moved MPHD to a new location - most of the material copied over smoothly but there’s still bits and pieces which need fixing. If you see anything wrong, please let me know! Just hit reply.


Technical Leadership

Standups: Individual → Teammate - Kent Beck

I’ve argued in the past (#137) that retrospectives are to managing teams what one-on-ones are to managing individuals - that they are the key meeting that builds trust and maintains open lines of communications.

Beck makes a good case that for some teams, standups may actually play that role:

That’s what the standup meeting is there for—to give everyone a moment to say, “Oh, yeah, okay, these people & their needs are going to be important to me for the next 9 hours.” Take a breath. Get to work.

and that it may be more necessary for remote team:

If the team is co-located & used to showing up at around the same time & pairs switch frequently & folks gather around the espresso maker most mornings, then maybe folks don’t need a ritual to manage the transition from person to worker. If the team is fully remote & in different timezones & just getting used to each other then likely yes, the need for ritual supporting the transition is greater [..]


Managing Your Own Career

How has Life after Leaving the Labs Been Going? - Glen Lockwood

Lockwood, well known in the community for being an incredibly capable and influential DOE HPC storage architect, has been at Microsoft and has thoughts! He has a great overview of the differences he’s seen, and what the labs (or really, research support teams in academia) struggle with in a way industry doesn’t. Two I think are particularly worth highlighting:

  • Accountability: “Teams coordinate dependent work with each other, trades horses on what the priority of each request is, and at the end of planning, have committed agreements about what work will be done in the next semester. […] The DOE Labs [and DRI teams - LJD] operate much more loosely in my experience. There, people tend to work on whatever pet projects they want until they lose interest. “
  • Pace and decision making: “Because managers and leaders are accountable, I’ve also found them to be much more empowered to just do what they feel is the right thing to do. Whereas no big decision in the DOE Labs [or academia - LJD] can be made without reviews, panels, strategic offsites, more reviews, and presentations to headquarters–all of which could add months or years to a project–the direction can move on a dime because all it takes is one executive to sign off and accept full responsibility for the consequences of their decision. “

The “Accountability” (and really, “Relevance”) sections touch on something deeper - there’s clear overall goals in mind. So just doing individually worthy things isn’t enough, they have to coherently add up to something.

The best PI-led large groups understand this, but it’s something a lot of digital research support teams wrestle with. It touches on “Strategy” (#168, #169), but also a coherent programme of action with something like a logic model (#163) and the flywheel of success for our teams (#176). I’ll write more about this next week.

Everything Lockwood writes lately is deeply thoughtful and worth sharing - his thoughts in his recent ISC recap, for instance, are extremely clear and helpful.


Research Software Development

How to fork: Best practices and guide - Joaquim Rocha

Rocha describes a sanity-saving process for maintaining a downstream fork of an (active) upstream effort.

Most of research software management handles the easy-mode case of collaboration, where the code is really only contributed to by one group, so we can get a little fast and loose with our git history.

This is absolutely not the way to go when we’re tracking an active upstream, and if we want to get some of our changes (ideally, all of them) merged!

Rocha covers some basic hygiene in your own fork:

  • Use atomic commits
  • Identify your fixes and non-fixes
  • No evil merges (merges that introduce changes not properly from either parent)
  • Rebase early, rebase often

And then how to handle the rebasing:

  • Straighten git history
  • Minimize downstream changes
  • Squash downstream commits
  • Keep upstreamable commits you’d like merged at the beginning

Unveiling the Energy Vampires: A Methodology for Debugging Software Energy Consumption - Roque, Cruz, and Durieux, arXiv:2412.10063

In research computing, as elsewhere, people are starting to pay more attention to energy use and not just wall-clock time performance.

This is still a relatively new interest. We have hundreds of tools for identifying where time is being spent, but few for identifying where power is being consumed.

But not none! The authors tell us that on the CPU, Intel offers an interface RAPL which contains relevant information, and perf, PowerTOP, and Powerstat can access that information. AMD has its own partially-compatible version of this interface with more fine grained data. From there, (literal!) hot-spots can be identified, and energy microbenchmarks can zoom in on particular code might be a culprit.

In this paper (with code), the authors investigate Redis, which has notably different power consumption on different Linux distributions. They walk through their methodology and identify memcpy (and different memory allocators) in alpine vs ubuntu (gnu libc vs musl) as the culprits.

It’s the systematic methodology which is of broadest interest, and the code and notebooks they supply which walks you through the methodology are very useful.


GitHub copilot now has a free tier, presumably feeling some competition from products like Cursor.


Interesting paper describing automated OpenMP mutation testing for performance optimization


Research Data Management and Analysis

Genomic Data Cybersecurity and Privacy Frameworks Community Profile - Pulivarti et al, NIST

NIST has been doing great cybersecurity work lately, and this document is the second public draft on their Genomic Data Cybersecurity and Privacy Frameworks community profile - their cybersecurity and privacy frameworks applied specifically to genomic data.

The document is quite clear and straightforward reading (don’t be put off by its 174 pages, most of it is essentially appendices), and lays out not just objectives for this type of data but gives a sense for how the broader cybersecurity and privacy frameworks are applied to a specific case.

This is a good document to have a careful look through for any group that’s starting to handle more sensitive data.


Research Computing Systems

Slinky: Bridging Slurm And Kubernetes - SchedMD

Well, 2024 came and went, and as far as I can tell, Coreweave did not in fact opensource SUNK, their (by all accounts extremely successful) Slurm-on-k8s implementation. (Though containers seem to be available, maybe?)

However, SchedMD hasn’t been standing still. They have implemented their own, and with a much better name - Slinky.

I haven’t used slinky yet, so I can’t speak for the implementation, but it seems to me that something like this is almost certainly the direction things are going to be moving - Slurm (or similar) is a fantastic tool for what it does, but its super-optimized for a very narrow use case. More complex workloads - or even simpler-but-different workloads, like standing a single long standing service that might require varying number of resources over time - are very difficult to shoehorn into slurm’s mental model of a task being a single short-lived rectangular thing taking N nodes x M hours.

Whereas k8s’s super-general model is really hard to grasp for people who aren’t full-time Kubernetes admins, and so the simple case takes a lot of seemingly mystifying boilerplate.

Running something like a batch scheduler on top of a general purpose “cluster operating system” like k8s seems a natural progression, and tbh seems a lot better motivated than trying to build a next-generation “does everything conceivable” framework like flux.


Random

93% of Paint Splatters are Valid Perl Programs, because of course they are.

Nice visualization of piecewise linear neural networks (e.g. with ReLU activations). I’ve used this to do handwavy explanations of the universal approximation theorem.

SQL-Studio, a nice Sql database explorer.

In-DB diffs of different large datasets - reladiff

Heck, why not use SQL for Advent of Code

Nice blog post from earlier in the year about BB(5) being proved to be 47,176,870 (e.g. the largest number of steps a terminating 5-state Turing machine can take is ~47M). What I find interesting is the growing use of Coq/rocq or Lean for not just testing but communicating proofs.

When I was in grad school, we counted mathematical operations and tried to reduce the number of them to get high performance. Now it’s all about sophisticated memory access. Here’s an example of counting bytes as fast as possible.

And here’s optimizing binary search using SIMD.

And here’s 20M particles at 20fps on CPU in javascript.

Curses TUIs in bash w/ bashsimplecurses.

Scientific programming in Lean, a theorem prover.

Look at what they’ve taken from us, part 1 - one of the last users of floppy disks, the SF Muni, is replacing the system and it’ll use some fancy new-fangled storage system instead

part 2 - the IDEs of 30 years ago - Borland C++ or Turbo Pascal had the single best IDE I’ve ever used.

Find your new favourite UUID right here - everyUUID.com


That’s it…

And that’s it for another week. If any of the above was interesting or helpful, feel free to share it wherever you think it’d be useful! And let me know what you thought, or if you have anything you’d like to share about the newsletter or stewarding and leading our teams. Just email me, or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers. All original material shared in this newsletter is licensed under CC BY-NC 4.0. Others’ material referred to in the newsletter are copyright the respective owners.


Jobs Leading Research Computing Teams

Given the long break, there’s a tonne of amazing jobs in the new-listing highlights below in the email edition; the full listing of 339 jobs is, as ever, available on the job board.