#185 - 20 Oct 2024

Research Judgement Must Direct Research Resources. Plus: Good software engineer teammembers; Free tier as a business model; STRUDEL and UX; Python 3.13; Writing code for humans; Next level HPC support; Rackspace and OpenStack

Hi, all! Sorry for the pause in newsletter issues; I’m trying to figure out how to better work the newsletter writing into my weeks. You may see shorter round-up writeups for a while, which I think is going to be better-all around; thanks for your patience, and all feedback always welcome!


In #182, Scientific Judgement Is Our Job, I argued that applying scientific judgement to the work we do is just part of the job:

I keep hearing is that the team’s job is to support all research and researchers equally. That (a) is very much not what the team’s job is; (b) it wouldn’t be possible if it was […] It is not our teams’ jobs to just mindlessly churn out generic widgets for (or provide widget access to) anyone who asks. We are hired for our judgement. In fact, we’re hired explicitly for our (rare! precious!) combinations of scientific and technical judgement. We apply that judgement in make decisions about who to support, how much effort to put into each effort, and what to go after next.

I got some good push-back on how I stated this, so I want to add some nuance to this and expand it a bit.

It’s worth spending more time on because it’s pretty fundamental to the main tenets of this newsletter: that our job is to advance research and scholarship as far as we can, given the constraints we face; and that this is an important, challenging, and rewarding job, worth doing with professionalism.

Research resources — funding, equipment time, lab space, the professional time of technical experts like those of our teams who support research work — are far too scarce and precious to just assign them “first come, first serve”.

There’s a reason why grant funding requires so much justification and is so competitive, why telescopes or other major scientific facilities have allocation committees, why time on major supercomputers is competitively assigned. There’s too much work to be done, and worryingly few resources to do it all.

So as a community, research has as a foundational principle that research judgement must direct those scarce research resources to where they can do the most good.

Like anything involving human judgement and predictions of future outcomes, it’s a messy, flawed process. Disappointingly, there’s been a lot of recent work showing that these processes are still far too inequitable in a number of predictable ways, both in who the funding goes to and in whose problems are being researched.

But “just give the resources to whoever asks for them” isn’t an alternative, particularly since the loudest voices and most assertive requests tend to be disproportionately from groups that are doing pretty darn well under the current allocation of resources.

So judgement has to play a role. Does it have to be ours, though? We just want to support research; making these kinds of judgement calls seems like (is!) an enormous responsibility to take on.

Well, it has to be someone’s. There’s a few ways it can happen.

One is that funding goes straight to the researcher, who then decides how to spend that funding — maybe with our team, maybe on a postdoc salary or piece of equipment for their lab. I’m a BIG fan of this model. Here, there’s a couple layers of research judgement being applied; some funding agency ranked the proposed project as one of the (too few) to receive funding, and then the researcher themselves decided that in the execution of the project, the most cost-effective way forward were the services we offered which were worth paying for (#178). In this case, we don’t need to really decide “is this effort is worth doing”; that question has been asked and answered a couple of times. Here we just need to make use of our scientific and technical expertise to execute the project well and effectively.

This approach has much to commend it. For one it means that teams that are focussed on the needs of their research community can thrive, and it can enable the existence of really specialized facilities. This is one of the reasons that this is approach is the default in biomedical cores.

There are some real downsides people point out in objections. One is that in many jurisdictions, the rules by which teams are allowed to operate under this model within a university are extremely onerous and restrictive.

Another is that not all resources (but more than you might think!) can be funded through a number of individual researcher purchasing decisions; some need some sort of collective action.

A third is that it depends on a fairly broken and biased research funding model. That’s a real problem, but one best addressed at the root (get on funding committees; make sure research papers about problem with the current funding system get widely read and understood) rather than everyone trying to do ad-hoc workarounds individually.

On the other hand, an objection which isn’t at all compelling is, “but researchers would never pay their own money for these services”. If it’s true that researchers in the community would never see the services as valuable enough to spend their own research funding on those services, what exactly is the justification for allocating some other pool of all-to-scarce research funding on providing them? And why would anyone want to spend a good part of their career offering services that researchers wouldn’t bother to pay for if they had to?

So that’s one proven and successful model that works in a number of contexts.

A second model is that funding goes to some centre or group which has resources, and there’s a competitive allocation process for allocating those resources. Here, the allocation committee is playing the role of a funding agency, “funding” the research project with access to expertise or equipment.

(I’ve had a few people in a few different contexts take issue with this comparison between the allocation committee and a funding agency, which baffles me. To my mind this isn’t controversial at all. It’s allocating funding, but the funding is in the form of basically gift cards you can only use at this facility - pre-paid credits for compute or software services or data science work or bioinformatics services or the like.)

Again, the research judgement directing research resources question is asked and answered by the allocation committee; we can accept that decision and simply apply our judgement to doing the work.

This is a perfectly fine and time-tested approach. It works best with large, one-of-a-kind resources, like the large telescopes or other major scientific equipment facilities, which is where this method began.

But I find it’s applied more widely than is really optimal.

  • There’s the fundamental issue that these committees are separate, and in particular usually disconnected from grant funding, so you can have researchers in the nonsensical position of one committee having given them funding to hire a postdoc to do a project, but another not given them them the compute time/software support/data support necessary to do it, or vice versa
  • Relatedly, the most interesting research projects these days often require more than one type of technical research resource - data support and software development, or software development and computing time - and these committees are almost always siloed resource-by-resource, meaning they might have the (say) compute time but not the software support needed to do the work effectively.
  • The resources being allocated often are local quasi-monopsonies — the only “seller” of such resources available in the institution or community — even though analogous resources with different capabilities are available elsewhere. (and remember, those other capabilities are colleagues, not competitors - #142) Even though one of those other resources might be a better match for the research in question, these “allocation gift cards” are all that is offered and taking them elsewhere isn’t an option. Having these allocations happen at the national or regional level, aggregated across multiple resources, helps a great deal.

Finally, there’s the model where there’s no such superstructure deciding which and how much research resources to apply where, and so it comes down to us.

The default situation in this case is just to avoid making such decisions, and passively be order-takers, sitting in our offices, waiting for requests, and doing whatever anyone asks us to do. To be clear, this model can work when (say) a team is just starting out and isn’t yet sure where it can have real impact.

But as demand and experience grows, this turns into my least favourite model where there is no such research judgement of what to do being applied in any consistent way - there’s no discussion of where and how to best apply effort, what kind of activities we’re doing matter, what the impact of anything is; so everyone on the team just sorts of plays it by ear, which typically just means doing what they’ve always done instead of focussing on the highest bang-for-buck work (#177), and continuing providing services for the same people indefinitely or those who ask for it most loudly.

I see this fairly often when the headline resource is access to equipment like computing time, and on top of that there’s also a lot of people time available for support. Here the hardware is almost allocated through a formal committee. On the other hand, how much people’s time, expertise and support should be offered, and to who, is an afterthought. (The centres who do this almost always have text on some webpage or another saying “people are our most important resource”, buried in the middle of 50 pages of talking about the computers).

However it comes to be, this ad-hoc order-taking model isn’t effective. It’s not a responsible stewarding of the research resources we’ve been given. It’s not taking seriously the charge our community has given to us, to advance research and scholarship within our community as best we can given the constraints we face. It’s abdicating our responsibility to our staff, who have the right to see their efforts consistently have the highest impact possible.

I also find it a bit baffling. The inevitable justification for having such services delivered locally is that “the local team understands the institution and knows the research group”. What’s the point of a team having such rich local context and knowledge if it’s not to inform decisions?

So as a community I’d like us to move away from this model (lack of a model?) and towards something more systematic and active. When we do find ourselves in this situation, I’d like to see more of our teams move away from a passive, “take a number and we’ll be right with you” approach towards taking an active role in pursuing high-bang-for-buck projects and reducing our time spent on lower-impact work. Where we are starting to have intuition of how we can have impact, we can start sketching crude logic models (#163), thinking in terms of positioning, prioritization, planning, and stakeholder engagement (#168), and generating success stories (#179). That’s how we get the research support flywheel going (#176).

And with that, on to the roundup!


Managing Teams

On the other side of the internet, over at Manager, Ph.D., in the current issue I talked about wrestling with ambiguity when in our early scientific training we were really only taught to deal with uncertainty.

Round-up articles were on:

  • Feedback is for behaviour change, not self-expression
  • Feedback on subjective topics (hint: almost all feedback is subjective)
  • Trust is necessary to for us to know what’s going on
  • Change is risky, and makes things worse before improving them
  • If people aren’t reading our emails, that’s on us.

Technical Leadership

On Good Software Engineers - Candost Dagdeviren

Dagdeviren gives a quick definition of what he looks for and tries to develop in technical staff

A good engineer is one whom I, as a manager or peer, can trust to progress a project, knowing that they will deliver a solution by working with the team and producing good quality, again and again.

and gives some examples of what that looks like (in much more detail than I’ve summarized below)

  • know how to influence others and the organization to deliver a solution as a team
  • understand the processes they are operating in while taking a project from idea to solution
  • spend extra time learning the environment they’re in so they can independently drive the work
  • take a proactive approach and embed quality elements into their deliverables to improve consistency and velocity
  • understand the stakeholder’s need and fine-tune their approach accordingly
  • constantly work on reducing complexity
  • are reliable in consistently changing organizations
  • are team players regardless of personality type

The argument is that this can be applied at all levels of seniority.

I really like this, and I think these sorts of skills are what we should be aiming to develop in our staff. The scope for each expands as you go up in seniority, but the basic ideas - delivering as a team, and creating the environment where the team can deliver - is what we’re aiming for.

In our line of work, we and our staff have grown up in environments where the focus tends to be on individual smarts, individual expertise, and individual performance. But in our teams, we’re delivering together (and often with other teams), and someone toiling brilliantly alone at their desk is only doing part of the job.


Reaffirming our commitment to free - Nitin Rao, Liam Reese, James Allworth, Cloudflare blog

This article, like Tailscale’s article on “How our free plan stays free”, very clearly outlines where the free tiers fit into the business model of their respective companies, and reiterates their ongoing support for those free tiers. (Disclaimer - I happily use free tiers of both Cloudflare for this newsletter and Tailscale just for messing around with my various computers, but don’t have any other relationship of any kind with either of them).

I wrote in #170 about risk management and gratis offerings. In this post-ZIRP environment (roughly 2008-2023), tech companies are massively cutting on back spending, cutting back hiring (and having massive layoffs), and focussing on profitability rather than growth and goodwill. That means free tier offerings are dropping left and right, and we should be very careful about taking free tiers of anything and having them be key parts of our delivery mechanism for services.

That doesn’t mean we should steer well clear of all such offerings - how could we? It just means being careful and using basic risk management approaches.

Part of that caution extends to understanding roughly the business model of the company offering the free thing, and how it plays into their operations. In both of these cases, the authors explain why free tier users are of very little extra marginal cost to them, while providing real value not just in terms of goodwill but as key parts of their go-to-market and customer-feedback-and-improvement activities.

Being able to think a little bit like a business person and understand cost-benefit tradeoffs is useful not internally for for our own operations (after all, we’re professional services firms of a sort, #127) but in dealing externally with vendors.


2M users but no money in the bank. Tough times - Jeremy Walker

Maybe relatedly, and certainly soberingly, the partial wind-down in September of Exercism (a programming tutorial community) reminds rather forcefully how hard it is to be in our kind of operation, professional technical services delivery operations run in a nonprofit business model (#164). Providing a general good where everyone benefits somewhat but no individual community benefits enough to be willing to pay for it is just a savage place to find yourself when money starts getting tight:

I’ve lost faith in the nonprofit business model working in a way that allows Exercism to reach any of its potential. Keeping something free for everyone relies on either the user being the product, or on significant donations, and without either, it’s very hard to grow.


Product Management and Working with Research Communities

The Journey to STRUDEL: How We Came to Embrace User Experience in Scientific Ecosystems - Lavanya Ramakrishan

Ramakrishan gives a good talk about an effort towards understanding what UX means in our ecosystem, and the STRUDEL project provides some basic frameworks others can use towards thinking about and improving user experience.

This is important and worth thinking about. Researcher’s time matters, too (#102). If we want to power a research flywheel (#176), we want as little friction as possible in scientific workflows.

And since interesting research projects increasingly cross data/software/systems practice boundaries, we don’t want to break up into digital research practice silos, even if we specialize (#114), so user research and user experience has be considered across the entire user journey from data and software to systems and job or deployment.


Research Software Development

Everything you need to know about Python 3.13 – JIT and GIL went up the hill - Drew Silcock

Good overview about the optional free-threading build and first pass at JIT that’s part of October’s CPython 3.13 release.


It’s hard to write code for computers, but it’s even harder to write code for humans - Erik Bernhardsson

I was asked in a job interview a zillion years ago what I thought the most important advance in programming was over the past decade or so. I answered, without hesitation, the fact that the software engineering community was taking the idea of cognitive load seriously - that writing code for humans was at least as important for writing it for compilers.

“Writing code for humans” is the one of ideas that is even more important in scientific software development than in (say) tech. Our codes encode a lot of subtlety, even more than complexity, and they can live and be used for decades.

Bernhardsson gives some advice about writing code that’s going to be used and modified by humans, and become part of a workflow used by teams of humans:

  • Examples, not “core concepts”
  • Good error messages
  • Avoid conceptual overload
  • Use familiar names for things
  • Make it programmable (“People will do crazy things with your codebase” - truer for scientific code than anywhere else, I think. Programmability, I think, is even more important in this world where our users are using GitHub copilot or other LLM tools for coding - #173)
  • Be careful about “magic”

Research Computing Systems

INESC: European supercomputer users can now get specialized “next level” support and training - Science Business European supercomputer users can now get specialized “next level” support and training - CSC Blog

Between this effort and efforts by XSEDE and (to a lesser extent) ACCESS, I’m pleased that high-level, more professional-services like support offerings are starting to be a model more familiar to the research community. I do worry about there being a “missing middle” between helpdesk-type support and significant support engagements, but still having the bookends in place is terrific.


Rackspace Goes All In - Again - On OpenStack - Jeffrey Burt, Next Platform

This is good news. The community needs something like OpenStack (and yes, in which case it also needs OpenStack Enterprise as a revenue model), so it’s great to hear that Rackspace is recommitting to its development.


Random

DRAMHiT is a very cool high-performance hash table from the Mars research group @ U Utah, which uses distributed systems approaches within a single modern multi-core, multi-NUMA domain node because inter-core communications is (according to the authors) dominant for hash-table workloads. Very interesting technically (paper is linked) and the approach taken.

Google internally recommends memory-safe languages for new system software, but does not recommend wholesale rewriting of old code into such languages because new code, in any language, has vastly more bugs.

A great history of block storage at AWS.

Has anyone used automated tools like creduce to produce compiler bug reproducers?

I would love for Python adopt a reactive notebook like marimo, the way Julia has with Pluto. But then again, I’d kind of prefer something like Posit studio which has great off-ramps to get code into files and version control to get super popular. Well, guess we’ll see what happens.

In praise of tidying up code as you go along, in small bits, as long as it works out.

Very cool looking course on quantum machine learning, out of École de Techologie Supérieure

A nice walkthrough of the tar history and format, by way of an argument about what’s wrong with OCI container formats currently.

As long-time readers will know, this newsletter is principally an embedded database fanzine; thus, it pleases me to no end that you can run entire Rails 8 apps now in production w/ SQLite, powered somewhat by the community’s realization that SQLite is fine for such workloads if you choose defaults accordingly.

Are weblinks finally coming back? Can’t wait to look up how the newsletter website is doing on technorati!

Something I’d like to write about once I’ve had more time to think it through is summed up in this very brief article (and really, just the headline): AWS executive says regulated industries moving fastest on AI. One of the weird things about ML-type adoption (like cloud, really) is that it’s hitting fastest in areas we think of as being pretty staid, technology wise - finance, health, qualitative research, even pure mathematics. I think that’s caught our teams kind of off guard.

You get quadruple precision! You get quadruple precision! Everyone gets quadruple precision!!


That’s it…

And that’s it for another issue. If any of the above was interesting or helpful, feel free to share it wherever you think it’d be useful! And let me know what you thought, or if you have anything you’d like to share about the newsletter or stewarding and leading our teams. Just email me, or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming weeks with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers. All original material shared in this newsletter is licensed under CC BY-NC 4.0. Others’ material referred to in the newsletter are copyright the respective owners.


Jobs Leading Research Computing Teams

This week’s new-listing highlights are below in the email edition; the full listing of 336 jobs is, as ever, available on the job board.