#101 - 19 Nov 2021

Broad research computing teams; Giving and getting better feedback; Real-time alerts from Fast Radio Bursts; Gaining mroe 'technical wealth'; Simulation lab managers

Hi!

A reader from a U15 research institution [Canadian research-intensive universities; think Russell Group in the UK, or R1 universities in the US] writes in to describe expanding their research computing team with unconventional roles:

First, a grant advisor, specifically to assist PIs in writing sane tech inclusions in their grants. You may have reviewed grant proposals where the medical science, particle physics, quantum chemistry, etc. is very clear, and then the explanation of the computational aspects and the equipment justification sounds like Dilbert’s boss wrote it. That is precisely what this position is intended to improve, but also sitting on internal panels that judge Innovation Fund proposal maturities before they’re allowed to apply, etc. Second, a Communities of Interest Coordinator, who will foster and support research communities of like-minded graduate students, PDFs, etc. around research fields making use of computation—bioinformatics, AI, digital humanities—or around digital research tools—R, Julia, MATLAB, Gaussian, etc. By supporting communities of interest, these groups can become shared knowledge hubs, where newbies can find guidance or “the ropes” and experienced but stuck researchers can find inspiration or “an ear” that might enable them to unstick. Both positions have been filled internally and start in December. More traditional ARC job descriptions are being written up now as part of a further expansion.

I love this! It’s a long-standing tenet of this newsletter that research computing is much more than just technology. It’s teams, it’s communities, it’s product management - it’s people. Connecting researchers and their more directly to the computation, software, and data resources that can advance their work — whether that means in grant writing or capacity building within a practitioner community — is very much part of the our broad remit in research computing and data.

One of the things I wrestle with in this newsletter is how to make things easier for readers. With the discipline defined by the community of readers being so broad, the range (and volume) of material that gets covered here every week is… well, it’s a lot. On the other hand, I don’t know how to distill or partition things further without losing out on these very important dimensions of our profession. As always, if you have any ideas or suggestions, or want to share your own teams accomplishments and stories - or even just want to talk about research computing teams - hit “reply” or email me at jonathan@researchcomputingteams.org.

For now, the roundup!

Managing Teams

Don’t Soften Feedback - Lara Hogan

Reader, I’m not proud to say that I’m actually pretty rubbish at this. I tend to very much want to soften negative feedback, which is easier for me but is in the long term worse for the team member and the team as a whole.

What’s worse, people are not uniformly affected by this. Women, Black, Asian, and Hispanic team members tend to get softer and less-actionable feedback, especially but not only from male managers, which holds back their growth - how can they grow effectively if they aren’t being told what to work on?

Hogan here tells us things we’ve talked about before, but we - at least I - need periodic reminders of. Make the feedback easier and more constructive to give by linking it to desirable outcomes for the team, make the feedback succinct and to the point, and distinguish facts from assumptions. There are also cautions here about peer feedback and potential bias.


Managing Your Own Career

The Best Leaders are Feedback Magnets — Here’s How to Become One - Shivani Berry

Relatedly, if we want to grow, we need good, actionable, feedback. In our industry, a lot of our directors are pretty hands off, which certainly has advantages but means we don’t get the guidance we’d benefit from. Berry has two broad categories of recommendations for how to get more feedback and accelerate your growth:

  • Learn how to accept feedback well - manage your knee-jerk reaction, think of it as an opportunity to grow, ask meaningful questions to learn more, and reflect. This will make it more comfortable for people to give you the feedback you’re looking for; and
  • Help people give you better feedback by asking specific questions (not “how am I doing”), asking for the more neutral “advice” rather than “feedback” (people love giving advice), and demonstrate that you act on the feedback

Product Management and Working with Research Communities

Building confidence in a decision - Martin Tingley with Wenjing Zheng, Simon Ejdemyr, Stephanie Lane, Michael Lindon, and Colin McFarland, Netflix Technology Blog

I honestly believe that having a science background can be a huge advantage for leaders and managers, if we engage that part of our training in making management decisions. Data collection, experimentation, understanding that we don’t know everything, accepting that an approach has been disproven - these are all pretty fundamental skills, but it’s sometimes easy to compartmentalize them, to be things we only use when studying something as part of our work but not for studying how we work.

This Netflix tech blog describes making product decisions at Netflix, using a data- and experimentation-based approach that should be extremely familiar to those of us in the sciences. For that community, we have had these skills drilled into us for years, practiced and honed them - our problem is not that we are too much still scientists but often too little, and don’t take the same rigorous professional approach to managing teams and productsthat we did in our academic career.


Cool Research Computing Projects

Real-time alert system heralds new era in fast radio burst research - McGill Science Blog

Some of the most exciting research computing and data projects today really demonstrate the breadth of RCD. They tie together software, systems, and data, in a way that doesn’t make any sense to break up into silos.

CHIME, a really cool radio telescope, constantly scans the sky (well, the sky constantly scans over it) and it maps 21cm emission from hydrogen gas from the early Universe to better understand cosmological structure.

But of course if you’re looking at large swaths of the sky in radio, you’re going to see a lot of other things too. One of them is fast radio bursts, transient (very transient - a few milliseconds) powerful radio bursts, mostly of seemingly extragalactic but unknown origin, first detected in 2006. Some pulse, most apparently don’t. 500+ have been detected, but they’re still a mystery.

CHIME has “seen” a number of these events, but to actually identify what cause them, you’d need more than just the single radio signal - you’d want to point a number of other kinds of telescopes at the event as fast as possible to see if you can see other signatures of a big transient event happening there. They’re thought to possibly be merging black holes or neutron stars, supernovae, or even more exotic events (Dark-matter induced collapse of pulsars! Decays of axion clusters! Absent data, the mind will wander to all sorts of weird and wonderful possibilities).

The problem? CHIMEs data pipeline, designed for cosmology, wasn’t really targetting looking for these things. It could detect them, sure enough - months after the fact, way too slow to arrange follow-up observations.

This press release from McGill describes the very cool work of a team, including Andrew Zwaniga and Emily Petroff, in building the CHIME/FRB VOEvent Service, which issues alerts in a standard format (VOEvent) within a minute of the event to subscribers who can arrange real-time followup observation. Since CHIME is now expected to see ~1000 of these a year, there’s an excellent chance that an event can be soon caught “in the act”, helping understand what is causing this phenomenon.

Needless to say, to go from months to a minute requires aligning the system, data processing, and alerting software carefully. It’s a super cool project, and hats off to the team.

The VOEvent service pipeline, image from Andrew Zwaniga’s master’s thesis


Research Software Development

The one code review method to rule them all - Jonathan Hall

Hall describes the pros and cons of pair programming vs pull requests for code review - either serve the basic needs of knowledge transfer and another pair of eyes, but the benefits and costs are different; PRs allow scaling to more pairs of eyes, leave documentation automatically, and don’t require synchronization, while pair programming is real-time, fast, and provides more natural opportunities for mentoring.

Which is better? The frustrating thing for technical experts for us is that there’s no objectively best answer, it depends entirely on the needs of the people system that is your team and organization. Either is great! Teams and orgs succeed using either of them - and it doesn’t even have to be 100% one or the other. The important thing is choosing which meets your teams needs, setting clear expectations, and revisiting periodically.


Reframing tech debt - Leemay Nassery, Increment
A Rubric for Evaluating Team Members’ Contributions to a Maintainable Code Base - Chelsea Troy

Once a software product is high enough on the technical readiness ladder - once it’s actually being used by communities - technical debt becomes an issue. The problem isn’t awareness - we all know code should be maintainable and well documented, etc. - the issue is the people systems to support individual developers in deciding to put time into activities that support that.

Nassery describes a rebranding that might help a lot in some circumstances research software teams find themselves in - rather than talking about reducing or eliminating tech debt, talking about building tech wealth, and what that new asset allows - faster development, fewer bugs, etc.

This sounds like a goofy rhetorical game, but I think there’s some value in this. Particularly in our line of work, a lot of people you’d be pitching the wealth-building to were involved in the tech debt in the first place either technically or managerially. By not talking about “debt”, which sounds like fixing bad stuff that happened in the past, it makes it easier to support efforts.

If Nassery’s article is more about getting buy in, Troy’s article is more about implementing activities to get it done. The argument is simple and hard to disagree with - if you want the code to be increasingly maintainable and documented, you need to build those code stewardship priorities into the incentive structure. That means reviewing code based on criteria which advance the maintainable code - flexibility, documentation, discoverability and transfer of knowledge and context, but also in evaluating and giving feedback to the developers on those criteria.


A vision for extensibility to GPU & distributed support for SciPy, scikit-learn, scikit-image and beyond - Ivan Yashchuk, Ralf Gommers, Quantsight Labs

The Quantsight Labs team has been doing a lot of great work on data analysis software, particularly but not exclusively in the python data ecosystem.

This article is a nice example of a architecture design document for a substantial proposed change to an ecosystem - enough concrete detail to get feedback from stakeholders/sponsors (such as, in this case, AMD) and a sense of feasibility from developers, as well as being a good quick overview of the current state of array handling in the python data ecosystem. The proposal is to augment SciPy with a backend and dispatch system, so that it isn’t relying exclusively on numpy for its work.


Research Data Management and Analysis

Best Public Datasets for Public Health Data Science Projects - Andrea Hobby

There’s obviously a lot more interest and motivation for doing work with public health data these days. Whether it’s for doing data work or for training courses, Hobby provides links to some links of health data resources that are available.


HPC and the Lab Manager - Carlo Graziani

We normally talk about data management in the context of experimental or observational data, but the generation of large simulation data sets also require diligent management of the “data” (simulation outputs) and the metadata about the simulations. You see some of this happening explicitly in large climate simulation data set collaborations, for instance, but it’s less often called out in other areas.

Graziani writes about his experience (at a centre I worked at a couple years before he arrived - go team FLASH) being part of large simulation campaigns, and the need for some kind of Operation Manager type role, the equivalent of a data czar or lab manager in observational or experimental labs.


AWS Data Wrangler - AWS

It’s been a long time, but the importance of effective glue code connecting data between services seems like it’s finally becoming more apparent to a wider community. This week I learned that AWS has an open source tool Data Wrangler which seems like it might be useful more broadly than just within AWS, and that connects pandas data frames with any of a number of data sources. I’m a bit skeptical about the product management of the Apache Arrow project, but it is that effort that powers tools like this.

From the github page:

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).


Downloading Satellite Images Made “Easy” - Aaron Geller, Northwestern Univ. Research Computing Services blog

An RSE from Northwestern, Geller walks us through the process of downloading satellite data from Earth Engine using the python API.

Interesting if you’re getting started in GIS, but also a reminder that in research computing we’re constantly teaching ourselves to do things that there’s not great documentation for, and simply putting that up on your team’s blog once you’ve done it is a pretty simple and effective way to contribute to the broader community (and to make sure you remember how to do it the next time!)


Research Computing Systems

Using the Slurm REST API to integrate with distributed architectures on AWS - Josiah Bjorgaard, AWS HPC Blog

AWS’s parallel cluster 3 (like other providers) now has a REST API, more readily allowing you to control clusters programmatically, which is cool - but Slurm has a REST API too, allowing you to create, run, and control jobs within the cluster. Slurm’s REST API is a little nascent and not meant for external use, but with some extra components this lets you spin up clusters, fire off jobs, and monitor the jobs, all with REST APIs.

I’ve got this under Systems because I think a lot of teams could make interesting tooling based on Slurm’s rest APIs, or similar capabilities, but I haven’t seen very much other than some monitoring capabilties. On the other hand, I’ve been out of this space for some years. What teams out there have built cool job submission, or job notification tooling on top of functionality like Slurm’s API?


Emerging Technologies and Practices

Security scanners for Python and Docker: from code to dependencies - Itamar Turner-Trauring

With yet more malicious code being found in packages - npm last issue, pypi this one - making sure your software is secure from the container image down to all the dependencies becomes vital. Here Turner-Trauring walks us through a number of scanners - Bandit, Safety, Jake and Trivy - for scanning our code, our dependencies, and our docker base images. It’s sad that it’s come to this, but here we are.


KubeCon + CloudNative North America 2021 videos - CNCF

All 229(!) videos from this event earlier this year are up - many of them will have some relevance to this community, whether it’s on general topics like monitoring, observability, and security, or a few (like this one) on specifically HPC topics.


Calls for Participation

UK National GPU Hackathon - 28 Feb, 7-9 Mar; Application Deadline 10 Jan. Free to accepted participants; online

Looks like a fun event - mentored hackathon with NVIDIA and OpenACC participating.


Events: Conferences, Training

CNCF Research End User Group: HPC/HTC End User Landscape - 1 Dec, Free

The cloud native computing foundation Researcher end-user group has an overview of HPC and HTC use of cloud native technologies like Kubernetes.


US-RSE Annual General Meeting 2021 - 3 Dec, 1-3PM ET, Free for US-RSE members

For all US-RSE members, the organization’s AGM featuring reports on the activities of the society.


Random

A one-liner in Python that creates an infinitely nested dictionary, and what that tells us about how Python handles assignment.

Using systemd to set up automatic backup to an external disk when it’s plugged in.

When a breakpoint is set to a function, debuggers stop after the prologue to the function, not at the start of the function itself. Here’s why.

Bluetooth for distance estimation.

Debugging story - stack corruption in a Windows game.

Multiple kinds of inheritance in perl, if you in turn have inherited some perl.

OAuth2 on a static website, using workers/lambdas.

CLI autocomplete (and, eventually, more) for iTerm, Terminal.app, Hyper, VSCode with Fig.

In tech we all love a good story about someone else’s catastrophe. This one won’t disappoint. Cascade of doom: JIT, and how a Postgres update led to a 70% failure on a critical national service.


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers, team leads, or those interested in taking on leadership the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.