#184 - 9 Sept 2024

The Job Market for Our Community. Plus: Actual managment; ChemOS; Google's View of Software Quality; SciCode codeing benchmarks; Leonhard Med TRE; Blender for scientific Viz; the Anaconda mess

Hi, everyone:

Happy September! For those of you, like me, in the Northern Hemisphere, I hope you enjoyed your summer and that the academic year is off to a great start. For Southern colleagues, I hope spring is off to a great start. And I hope the issues from the archives were of interest (I got some great comments!)

This will be a short one to get us back into the swing of things.

One thing I couldn’t help but notice as I kept the job board up to date and sent out the highlights is that we’re in a very different jobs environment than I ever remember seeing here at RCT international headquarters:

  • There’s just more jobs out there than I ever remember there being - we have 369 up right now, and I’m being much fussier about what gets included than I have been at any time in the past four years.
  • More jobs is better than fewer jobs, but the high number is also partly because jobs are opening up faster than they’re being filled. Some of these jobs are taking forever to close - there’s some good-looking jobs that’ve been on the board since February.
  • There’s been a startling number of institution-level academic jobs titled director or higher, for running combined software, data, and HPC efforts across campus. This is terrific news! When I started the board, seeing these teams fragmented by department and by practice was still the default.
  • Even better, some of these jobs explicitly have peer directors, reporting up to the same manager, who are in charge of shared research facilities (things like bioinfo cores, cryo-em facilities…). This is hugely important for a couple of reasons. For one, all these teams are peers, and I find it incredibly disheartening how few teams (even teams I talk to in the same institution!) know about each other on campus. It’s important that they work together; they are not the competition (#142)
  • Further, someone should be looking at all these research technical professional teams as a portfolio. As I say here so often you’re sick of hearing it — we are a research function, not an IT function. Our job is to enable research, and someone who finds themselves with an extra tranche of money some given budget year should be deciding between giving it to a spectroscopy core vs a research software team, rather than between a research HPC system vs more desktop support for the finance department.
  • Speaking of shared research facilities, I’m seeing an “echo” of the brief cryo-em facility manager boom that I saw last year - presumably this is tied to funding cycles for such infrastructure.

It’s worth discussing that in the context of the corresponding jobs in industry that I also include in the job board.

  • This is an absolutely brutal time for job seekers in tech, for almost everything except HPC infrastructure (including datacentres) or AI jobs.
  • Even in tech, AI jobs are staying open for a long time - there are just more jobs than candidates right now.
  • There are definitely job for people developing novel deep learning architectures, etc, but one big part of AI jobs are for applying deep learning to particular domains and problems. I think about this like numerical linear algebra - yes, it’s important, but there’s only so many people we need to be writing preconditioners and sparse solvers. Most of the work is around using those tools to get work done.
  • Another part is for AI infrastructure, especially around storage and orchestration.
  • Outside of AI and AI infrastructure for most other areas, tech jobs are seeing hundreds or thousands of applicants per open position.
  • So this doesn’t help us for some of our roles that were already hard to get candidates for, but if you’re looking for say web developers or data scientists or systems administrators or even say small-scale kubernetes infrastructure expertise there are a lot of people looking for alternatives to industry right now.
  • On the other hand, biotech has picked right back up after their own round of layoffs, which is making things tough again for some core facilities trying to hire.
  • Quantum is starting to pick up again in earnest, too.
  • When the job board began, I was seeing “manager, data management” type jobs in regulated industries which were starting to hire data science teams. Now, with the AI boom, these have become something like Data Governance or Responsible AI roles, and they are everywhere, having really picked up over the summer. I find those roles fascinating not just because they reflect the importance of what we do (and the responsibility we should feel!) but because they are interesting interdisciplinary jobs, combining data, software, policy, and sometimes computing or storage expertise.

More and more research institutions recognizing the importance of what we do is fantastic; the fact that other sectors have also recognized this has been, um, something of a mixed blessing. Still, this is the most exciting time for research technical professional jobs I remember seeing.

By the way - I maintain the job board with information from emails I get, augmented by periodic Google jobs searches. Every email (or direct job submission!) I get is that much less wading through search results that I have to do. Emails are even more important for jobs in Australia, New Zealand, or Europe - for reasons only known to Google, I no longer have any visibility into any AUNZ jobs through their search, and visibility into European jobs has always been challenging. So please send me or submit any jobs that involve leading research technical professional teams, whether as a manager or IC leadership (typically involving titles like “Lead” or “Principal” or words to that effect).

And with that, on to the roundup!

Managing Teams

Over the summer, Manager, Ph.D. also had issues from its archives:


Technical Leadership

The Antidote to “Manager Mode” is Actual Management - Ed Bautista

I’ve moved management stuff into Manager, Ph.D., but this one I think is specifically relevant to teams like ours.

What do small nonprofits, startups, and digital research professional teams like ours all have in common? They are often led by “founders”.

It’s still often the case that teams like ours are organized around their first employee, or the person who was brought in to start a team.

The thing about a founder led small team is that one person has the entire context of the operation and the relevant stakeholders in their head. It works amazingly well, right up to the point where it doesn’t, because such an operation can’t scale and often won’t survive if the founder leaves.

Now, many such founders have or develop the skills to start delegating, and growing their team members, and move to a more sustainable founding, one that isn’t built entirely around them. But some don’t, and those organizations really struggle when the amount of context the team needs to operate exceeds their founder’s hat size.

(You can ask anyone who’s been around the nonprofit field for any length of time if they know of a nonprofit that fits that description, and you will get a yes.)

The startup ecosystem tend to lionize such founder-centric efforts - and why not, there’s an easy to understand hero in the story! A recent article on “founder mode” vs “manager mode” generated a huge amount of discussion in this past week.

As Bautista points out, this is goofy and misunderstands management. My own opinion is that this is at least partly because founder-as-hero is an easy story to tell, while good, responsible, professional management - just doing the quotidian tasks of supporting a team, growing its individuals, understanding the landscape, and making sure decisions get made - is hard to talk about.


Cool Research Computing Projects

ChemOS 2.0: An orchestration architecture for chemical self-driving laboratories - Sim et al., Cell

I don’t have much to contribute to the discussions of laboratory automation. But it is absolutely wild to me that research computing has, over the course of my career, moved from being something that happens after or alongside of experimentation to informing and now actually orchestrating the experiments.

One of the themes of this newsletter has always been that siloing by type of work is a mistake - software “versus” systems “versus” data management or data science “versus” domain expertise. The most interesting applications are always combinations of at least two. Here we have databases, simulations, systems orchestration, and robotics all part of one workflow.


Research Software Development

RDEL #53: How does Google describe software quality? - Lizzie Matusov, Research-Driven Engineering Leadership

Matusov looks at a study by Google researchers about software quality is thought of within Google as a whole (obviously, individual teams have particular focus). The results are a nice overview of software, from code to use:

  1. Code Quality: This aspect of quality is focused on the internal attributes of the code itself. Key factors include:
    • Maintainability: How easily code can be understood, corrected, adapted, and enhanced.
    • Testability: How code supports testing efforts, allowing for automated tests and ensuring that defects can be efficiently identified.
    • Comprehensibility: How easily a developer can understand the code.
    • Complexity: The level of complication within the code.
    • Readability: How easily the code can be read and understood.
  2. System Quality: This pertains to the performance and reliability of the overall system. Key components include:
    • Reliability: The ability of the system to operate without failure over a specific period under predetermined conditions.
    • Performance: The efficiency of the system in terms of response times and resource usage, ensuring that the system performs well under various conditions and loads.
    • Low Defect Rates: Ensuring that the system has minimal bugs or issues.
  3. Process Quality: This focuses on the quality of the development processes themselves. Important aspects include:
    • Comprehensive Testing: Implementing thorough testing practices to ensure that defects are identified and resolved early in the development process.
    • Thorough Code Reviews: Conducting detailed reviews of code by peers to catch potential issues, improve code quality, and share knowledge within the team.
    • Effective Planning: Ensuring that project planning is robust and realistic.
  4. Product Quality: This is centered on the end user’s experience with the final product. Key factors include:
    • Utility: The usefulness of the product in addressing the needs and requirements of its users.
    • Usability: The ease with which users can interact with the product.
    • Reliability: The consistency of the product’s performance, ensuring that it works as expected without failures or issues.

When I started we focused mostly on 2, things like performance or bug counts; 1 and 3, code quality and process quality have become hugely more talked about in the past decade or two, which is tremendous. I’m just starting to see more of a focus on things like usability in our communities, but it is happening.


This is interesting - SciCode is a set of of research coding benchmarks; the intent is for training or testing LLM-based software development tools for coding particularly for research, but it could be used in a number of different contexts.


Research Data Management and Analysis

Leonhard Med, a trusted research environment for processing sensitive research data - Okoniewski et al, J. of Integrative Bioinformatics

Sensitive research data is becoming increasingly important, and we’re starting to see more papers on Trusted Research Environments (TREs) for managing and analyzing such data sets. This is a recent one I like because it talks a little bit about the evolution of the system as needs changed.


How to make an impactful scientific visualisation - Sebastien Lemaire, EPCC

This is pretty cool - EPCC has some tools for using Blender for scientific visualization, and Lemaire here shows a bit about a super-cool visualization of a jet of CO2 was made using Blender and those tools.


Research Computing Systems

Anaconda puts the squeeze on data scientists now deemed to be terms-of-service violators - Thomas Claburn, The Register

There’s been a lot of discussion about Anaconda’s terms of service (updated in March), especially as they’ve started directly approaching Universities. Here’s a publicly viewable discussion on ask.cyberinfrastructure, but these conversations have been happening on mailing lists all over academia.

Hopefully this case will get sorted out in some favourable way, but it ties into the same issue that came up a year ago with RedHat and CentOS (#170). It’s different in an important way - the Anaconda mess is galling because part of the issue is that the TOS changed so recently and suddenly. But it’s another example of the same phenomenon - free tiers vanishing or getting reduced in scope while institutions are being asked to pay money for something they’ve built into trainings and service delivery.

(Disclaimer - my current day job is for a company that sells support for software amongst other things; as always, this newsletter is my opinion and no one else’s, and the following has been my position for as long as I’ve been writing.)

Stuff costs money. Peoples’ time especially costs money (which is a good thing). Now that the era of zero interest rates that’s been with us since ~2008 is well past its end, and with new tax laws affecting software development in the US to boot, money is harder to come by in the software industry. That means that there’s been a huge increase of free tiers drying up, and organizations becoming much more serious about collecting money from organizations that are using their products as core parts of their own offerings.

In research we’ve gotten very used to getting things for free, and tend to believe we’re entitled to it — after all, we’re the ones advancing human knowledge! But that doesn’t change the fact that people still need to be paid if these products are to exist and be maintained.

I am not telling anyone to avoid using free software. We all run Linux and gcc and python, after all, and well we should! Its just that now we’re entering a period where to run our products and services with professionalism means recognizing that building free tiers/versions of software or services into core parts of our own offerings or operations is a risk that needs to be registered and managed; particularly in cases where the paid versions are the business model the whole offering rests on.

Obviously, everything has risk; a commercial offering (or paid tier) could change its price, or purely community-supported FOSS code could just stop being maintained. The FOSS code withers away slowly, though, and a price increase in the paid-for option is probably tracked at least at some level as a risk, since someone is writing cheques. It’s this intermediate case that’s been catching us off guard.


Random

A DSL for implementing math functions

A 1984-era Mac 128K on a A Raspberry Pi RP2040 microcontroller

Speaking of, here’s a blog which shares my deeply unpopular opinion amongst tech folks - I just don’t see the point of a Raspberry Pi when you can buy new or used micro PCs so readily.

In praise of profiling by just repeatedly running a program and hitting ^C.

Really interesting to see what comes of positron, Posit (née RStudio)’s “next generation data science IDE”

When I was starting my career, calculating digits of pi was a compute problem, now it’s a storage problem.

The Game of Life in HTML checkboxes

Yeah, but would that scale? The story of a single dashboard of 650 million checkboxes with shared mutable state for all viewers.

jq implemented in jq

Use GenAI to roast your github profile

Apparently you can deep link to a specific page in a PDF


That’s it…

And that’s it for another issue. If any of the above was interesting or helpful, feel free to share it wherever you think it’d be useful! And let me know what you thought, or if you have anything you’d like to share about the newsletter or stewarding and leading our teams. Just email me, or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers. All original material shared in this newsletter is licensed under CC BY-NC 4.0. Others’ material referred to in the newsletter are copyright the respective owners.


Jobs Leading Research Computing Teams

This week’s new-listing highlights are below in the email edition; the full listing of 368 jobs is, as ever, available on the job board.