#148 - 3 Dec 2022

Process vs Expectations; Coaching questions practice; UCL ARC always hiring; Invest in uncertainty reduction; Manager career leveling; Product lessons from Amazon Omics; Coping with too many code projects

I was part of a discussion recently about a disengaged team member, and the topic of process vs expectations came up.

I write and talk a lot about expectations and processes. I’m a big believer in both! I think they should both be documented, discussed, and violation of either should result in some discussion.

But they’re not the same, and they’re not substitutes.

I think we’ve all worked in (or heard about) places where there was some restrictive process in place, because “one time, this guy Frank - you weren’t here, you wouldn’t know him - well, he…” did something. This kind of process reaction to an expectation violation is stifling, and it gives process a bad name, because everyone knows it’s a cop-out. The unsuitability of this approach came up in the discussion, and everyone knew the failure mode being described, because it’s common enough to be a trope.

If someone violates some kind of expectation, the solution is almost always discussions with that person, and to work with them to ensure that one way or another the expectation is met in the future. It isn’t to impose a new process on everyone. The process approach is almost always an avoidance maneuver, an attempt at technical solution to a people problem. Talking with people about missed expectations is hard, takes time, and doesn’t scale. And it’s the right solution.

There are times when some bad outcome genuinely does highlight some systemic issue, where some kind of change is needed, and not just a discussion. Changes may need to be made to how work is done, or in the supports and resources available. Certainly if a particular expectation keeps being violated by different people, there’s some deeper issue that needs investigation. Some new or changed might end up being needed here.

But processes describe how something’s done; expectations describe the outcomes activities should have. Process describes the path, expectations are the boundary conditions. Nailing down the path that must be taken when what really matters are the boundary conditions being met is overly prescriptive, takes autonomy away from team members, removes room for creative solutions, and saps motivation.

It’s good and kind and respectful to have and describe high expectations for how your team members operate, and for teams to have high and explicit expectations of how they work together. We all have expectations, and it’s not kind or respectful to let people constantly fail to meet those expectations without letting them know. If you were failing to meet your boss’s expectations, or your team’s work was failing to meet a researcher’s expectations, you’d be mortified to hear that they felt that way and couldn’t be bothered to tell you.

Shared, stated, explicit expectations are how you and team members can be confident they’re doing a good job, and how a culture of trust and collaboration can form within the team. They are how team members know that they can rely on each other. They are the measuring sticks by which we all can measure performance, and growth. We’re all motivated people in this line of work; we want to set high bars for ourselves both technically and as team members, and to clear the bar, and then raise it another notch higher. And on occasions we miss, we want to know it so we can do better next time.

Processes, at their best, are time- and cognitive energy-saving devices. They are a shared way of doing things so that people don’t have to invent their own way to perform common work. They are a known-good set of steps so that routine stuff can get done, and others know what to expect, so that everyone’s mental efforts can be spent focussed on where creativity is valuable.

They can and should work together. There may be a process in place for (say) rotating the running of standup meetings, so that when it’s someone’s turn they know what to do and everyone knows what to expect at stand-up. On top of that there should be expectations around treating each other respectfully during the meeting. There may be processes for submitting code for review before pushing to main, and expectations about acceptable turn-around times and code (and review) quality. There may be SOPs for bringing a new database server online, and expectations around how the appropriate users are notified.

One way or another, processes shape how we interact with tasks, while expectations shape how we interact with people.

In the case of a disengaged team member who wasn’t getting work done in a timely manner, the solution almost certainly isn’t to create process guardrails. There’s clearly an expectation, and the team member isn’t meeting it. If nothing’s being done for the violation of expectations, would anything really going to be done for violations of the process? Do they know they’re not meeting the expectation? Do other team members share the expectation? Is the violation an expectation an issue of willingness, confidence, knowledge, or skill? Is it a temporary thing, or has it been ongoing?

The solution to the disengaged team member isn’t an easy or simple one, and it doesn’t involve sketching out a flowchart. It requires developing a strong professional line of communication with the team member, which means one-on-ones if they’re not in place; after that’s established (which will take weeks), it will require feedback and coaching.

The most likely outcome is that over some time the team member improves their performance, because people genuinely do want to meet their team’s expectations.

But the feedback may escalate without improvement, and the team member may end up leaving. That uncertainty, and (justified!) fear of the bad outcome, leads us too often to shy away from this approach.

But it’s not kind to let a team member struggle with work they don’t want to do, and it’s not kind to watch other team members have to do extra work and grow resentful as nothing is done. It’s certainly not kind to impose some kind of process on everyone because someone isn’t meeting expectations.

We don’t have an easy job, but it’s important. Like us, our team member deserve to know when they’re doing well, and they deserve to know when they’re not. They deserve team members they can rely on, and they deserve to hold each other accountable. That’s how we grow professionally, and it’s how our teams grow in impact.

And with that, on to the roundup!

Managing Teams

Six Creative Ways To Use Coaching Questions - Lara Hogan

We’re all experts of one form or another, and it’s hard sometimes not just to blurt out an answer, recommendation, or advice when our team members hare having a problem. There’s nothing wrong with doing that when it’s warranted, but in the long term it’s better for both you and the team member if most of the time they’re coming up with their own answers.

Hogan points us to her (PDF) list of open questions, and some suggestions for getting into the habit of using them:

  • Delivering one at the tail end of giving effective feedback (question, [situation], behaviour, impact, question)
  • Redirect the instinct to give advice into a brainstorming session focussed on one of the questions
  • Kick off your part of a one-on-one with one

The list of open questions isn’t crucial here - the key is to practice staying in coaching mode by asking open questions. If you do, the team member may come to a good answer themselves. Or, you may find areas where the team member does need some support to be able to tackle the problems themselves; areas where there are gaps in willingness, confidence, knowledge, or skills. Addressing those gaps will be more productive and a longer-term solution than providing answers as required.


At University College London’s Advanced Research Computing team, they’re moving to a model where they have positions (Data scientists, Data Stewards, Research Infrastructure Developers, and Research Software Engineers) permanently open. If at some point they need something specific, they’ll adjust the appropriate role:

In order to maintain balance within the team, our recruitments may at times require a particular specialism, and this will be noted within the advert.

Do we have anyone from UCL ARC reading, or any other teams who have tried to do something similar? I’d love to hear the mechanics of how this worked with HR and leadership. Hit reply or email me: jonthan@researchcomputingteams.org


Technical Leadership

Removing uncertainty: the tip of the iceberg - James Stanier

On and off I’ve discussed the importance of managers in reducing uncertainty about processes, people, and products (e.g., #34), but the same is true in our role as technical leaders.

The work we’re doing is almost always uncertain - we’re working in research! By definition, we’re pushing forward human knowledge somewhere it hasn’t gone before, or supporting work doing the same. There may or may not be dragons there, but we know we’re unsure what we’ll find.

Stainer writes in the context of product development, but the principle is very much applicable to our work: make sure that you’re investing up front in reducing uncertainty as well as making forward progress:

You reduce uncertainty by doing: prototyping, designing, writing code, and shipping. Each of these actions serve to reduce the uncertainty about what is left to build.

We’re people of science, and we know all about testing hypotheses. That’s the approach here - test some hypotheses, reduce the amount of uncertainty about the path forward, make forward progress, then test hypotheses again.

Stanier gives some advice about how to do this when there’s multiple streams to the work (define contracts up front) and mocking as much as possible. Some of the specific examples won’t be relevant if you’re not building web applications, but the basic guidance is good - invest effort in reducing uncertainty.


Managing Your Own Career

As the year winds down, many of us naturally start to do some retrospection — how did things go, how did we do, what should we personally work on next for our development.

Sadly, almost none of us get useful feedback or guidance about this from our own managers/directors. (One of my goals for this newsletter is to support and encourage RCD managers and leads who want to treat their work with professionalism. The hope is that some of you will then choose to go onto more senior leadership positions, and do better for the next generation of managers and leads. Our teams’ work is too important for the status quo to continue).

But, as ever, we can learn from managers and leads elsewhere. This engineering management career ladder from Dropbox, and this one from Stripe (PDF - managerial roles start at level 5), are useful starting places. Some of the evaluation criteria might not be meaningful for your roles. But even deciding in a principled way which ones are and which ones aren’t priorities for you is a useful exercise.

Of the ones that do seem meaningful, how do you feel like you’re doing? What are areas where some modest improvement might have the biggest impact - for the team, or for you personally? (Yes, it’s ok for us to think about ourselves as well as our teams).

Are there areas you’d particularly like to grow in, or hear more about, in the coming year? As always, don’t hesitate to hit reply to send an email just to me, or to email me directly at jonathan@researchcomputingteams.org.


Product Management and Working with Research Communities

Introducing Amazon Omics – A Purpose-Built Service to Store, Query, and Analyze Genomic and Biological Data at Scale - Channy Yun, AWS Blog

AWS re:Invent was this past week, and there were some interesting items for our research computing and data community.

Further down I’ll talk about some technical re:Invent news of interest about compute resources. Here I’d like to highlight some of the product management aspects of a new(ish) set of services, “Amazon Omics”. The components here aren’t entirely new for AWS, but they’ve been stitched together and positioned in a really interesting way that we can learn from.

  • The product is for a very specific community: It is absolutely clear who this product is aimed at, and who it is not aimed at. It’s not for life sciences researchers broadly, and certainly not for computational science work in general. It’s for people with lots of sequencing data to process and analyze - people who match that use case will want to find out more, everyone else will continue looking at other offerings. Could others make it work for their use cases? Yes, maybe, but it’s not for them, and AWS will have no hesitation about making it worse for those users if doing so makes it better for the genomics customers.
  • It’s a mix of compute and data services: What’s more, there’s two (arguably three) different data services. Compute is useless without data, data is useless without compute, and different kinds of data call for different kinds of handling. There are data storage, access, management, and analysis components that comprise Amazon Omics. Databases, object stores, pipelines, and compute. We’re well past the point where a product team targeting researcher use cases can focus on individual pieces of that puzzle.
  • There are clear interfaces defined, but the implementation details are intentionally fuzzy: “A sequence store is similar to an S3 bucket….” AWS is exposing a set of interfaces, and as underlying technology changes, or as they discover more how the products are being used, they can make implementation changes to improve performance and reduce their costs without users necessarily knowing.
  • It builds on existing expertise, of both technology and researcher needs: S3 buckets, databases, and AWS Batch/Step Function are already things AWS has, and has deep expertise in. Similarly, AWS is extremely familiar with how people use their existing tools for genomics work - they’ve collaborated with researchers very closely in the past. That existing knowledge, and existing infrastructure, was bundled up into something that will be very compelling for a specific subset of users.
  • It addresses a complete workflow: There aren’t gaps here. References and sequences go in, pipelines get run, output goes in something queryable in almost arbitrary ways via SQL. The work doesn’t get stranded at any stage. It’s a complete minimum viable workflow. Further, it’s part of a larger ecosystem of products that can be used to get the data in or post process results.

By bundling up existing technology pieces, in ways informed by previous collaborations, that addresses a complete workflow, AWS has made its existing technology offerings significantly more attractive (and discoverable) for a particular group of potential users, users who aren’t interested in choosing and tuning the pieces themselves.

By keeping the services comparatively high level, AWS can make things better (performance and cost) as they attract more usage for this particular use case and see how it works.

Bundling AWS’s technology and expertise up into a specific, complete offering like this means they can get better and better at both attracting and supporting genomics users at the same time. Such is the power of good product management.


Research Software Development

Coping strategies for the serial project hoarder - Simon Willison

This is a talk Willison gave at DjangoCon. It describes strategy for handling a large number (185!) of projects of by himself, but some lessons may be very relevant to many research software development teams, where there’s a large number of projects they might be intermittently called to contribute to, but a small team.

Summarizing it, I see two overarching strategies in the talk.

The first is to keep the work as “stateless” as possible, by making sure everything is self-contained - the projects themselves, and contributions to the project. For him, that means he doesn’t have to keep 185 projects in his head. For us it would make it easier to onboard new people, and to invite others to contribute. He does this by:

  • Having commits/PRs change one thing, but include tests and documentation
  • Having discussion (even if it’s just with himself) about every planned piece of work take place in an issue (a “lab notebook” for that work), where decisions can be recorded, diagrams can be added, description of work in progress can be added, and - bonus! - everything is date stamped
  • Planning for enhancements can live in an issue for a very long time before being acted on - adding discussion to the issue doesn’t necessary mean that code is imminent
  • Everything links back to the issue (and, eventually, the PR or commit)
  • Documentation lives with the code

The second is to build tooling and processes to support scaling:

  • Include writing about what was done (in a blog post, a twitter thread, release notes or somewhere) as part of the “definition of done”
  • Cookie cutter recipes for new projects, that build a standard way for getting started (including a test suite with documentation testing set up) so that everything’s done in a good way and a recognizably similar way to minimize cognitive load

Research Computing Systems

An Initial Look at Deep Learning IO Performance - Mark Nelson
Moving HPC applications to object storage: a first exercise - Hugo Meiland, Azure HPC Blog

A lot of systems that were initially architected for typical PDE simulation use cases are now struggling with the very different I/O demand of data intensive (and in particular deep learning) workloads. Big PDE simulations, after reading in some small input files, or maybe a single large checkpoint, are then completely write-dominated, periodically outputting a small number of very large outputs, synchronized across nodes.

That’s a hard problem, but it’s a very different hard problem than the I/O patterns of deep learning training workloads, which are constantly re-reading in small distributed batches from large datasets.

If you’re already very familiar with I/O patterns of such workloads, Nelson’s article may not teach you much. But if you’re just starting to wrestle with them, Nelson’s explorations here as he tries to tune and reproduce some benchmarks with both TensorFlow and PyTorch will likely be of use.

Maybe relatedly, Meiland’s blog post discussions moving another high-compute data-intensive workload, Reverse Time Migration, from high-performance cloud POSIX file systems to a Blob (object) store, with drastically reduced storage costs for the runs (although in this case, the costs of the compute vastly outweigh even the filesystem costs).


AWS Tunes Up Compute and Network for HPC - Timothy Prickett Morgan, The Next Platform

This is a good roundup of the HPC hardware side of re:Invent. Here the product management is also relevant, but the main interest is in how AWS picked a particular and at the time quite idiosyncratic approach for large scale technical computing (HPC, AI, and more) and continues to push it forward, with genuinely intriguing results.

My main takeaways are:

  • With the tweaked “Graviton3E”, Arm is increasingly attractive for research computing workloads (disclaimer: I have an interest here, as I work at NVIDIA, but between Fugaku and Gravition 3/3E, or Oracle Cloud’s Ampere instances, I don’t think this is a particularly controversial take, especially as power use becomes increasingly important).
  • AWS greatly advanced the state of the art of hardware-offload-of-networking-and-control-plane with Nitro, and continues to push that SOTA forward. After demonstrating that it meant that even untrusted multi tenant systems could securely provide bare-metal performance, we can expect to see more and more of this even in on-prem data centres, which (again, I have an interest here) to my mind will make a lot of interesting things possible. Also, have we ever seen even a cartoon diagram before of AWS cluster network topologies?
  • AWS continues to double down on it’s different take on HPC networking from Azure’s, focussing on driving down tail latencies to a variety of endpoints (storage as well as compute) rather than minimum latencies in (say) ping-pong tests. Whatever one might think about that, it’s fantastic to see a completely different tack being taken than the usual HPC approach, and it’s one which manifestly already works extremely well for say single rack jobs.

Random

There was a lot of AI model news in the past week or two - Stable Diffusion v2 is out, and Midjourney v4 and the latest version of GPT-3 are available. If you haven’t tried them I’d urge you to give them a go: GPL Chat or the Midjourney Discord Beta make it extremely easy. This article has an argument I find compelling why these tools are going to be useful in creative work (including science) in a way they’ve struggled to have an impact in, say, robotics or self-driving cars. Failure-is-ok attempts in workflows where there’s a human in the loop is going to go much better for creative work than in heavy machinery.

Maybe related - “Hey, GitHub” makes GitHub Copilot available via voice for improved accessibility.

The “Sovereign Tech Fund”, funded by the German Ministry for Economic Affairs and Climate Action, is funding development of the Fortran package manager and LFortran. It’s great that these non-traditional bodies (STF and the Chan Zuckerberg foundation) are funding foundational research computing software tools. But why haven’t research funding agencies caught up?

There’s an Emacs Conference, I guess?


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.


Jobs Leading Research Computing Teams

This week’s new-listing highlights are below in the email edition; the full listing of 188 jobs is, as ever, available on the job board.