#160 - 18 Mar 2023

Become the go-to team; Remove uncertainty first; Drop the plastic balls; User research; Research software categories; RSE Onramps; When do hard drives fail

Research Computing and Data managers typically have a lot more responsibility for given level of seniority than (say) equivalent levels in a large company. We’re charged with not only supervising work and overall direction and professional development of the team (which in tech companies is already split into two roles, a tech lead and a people manager) but we’re also typically given a lot of autonomy about setting direction and strategy.

On top of that, we are in this line of work because we want to make things better for research. We see how we could have so much more impact, if only we had more money, or if only Department X was on board, or…. We genuinely want things to be better, and have already taken on so much responsibility, so we feel the duty to fix that, too.

So when I talk to managers, especially new managers, I hear this sort of question a lot: “How do I get my senior leadership to give us more money?” or “How do I convince Dept X to get on board with what we’re trying to do?” It’s completely understandable, given the situation they find themselves placed in. But I find it discouraging, because it’s the wrong question.

We can’t “get” people to do anything.

We can’t even convince people of anything.

We can control what we do personally, and through that we can strongly influence our team, what they work on, and how they work together. Our influence attenuates very rapidly further afield.

Our only tools for having an impact on other people’s decisions are: finding out they need; demonstrating the usefulness of our teams work to the things they want to accomplish; and working together to accomplish them.

There’s talking to people, making sure our team excels at the right things, and collaborating, and that’s it. There’s nothing else. Those are the levers available to us.

If the goal is more funding specifically from senior administrators, the question isn’t “how do I get my Director/VPR/CIO to give us funding”, it’s “how does our team become the sort of team that our Director/VPR/CIO desperately wants to fund to further their goals for the institution”. If it’s getting more usage in a specific discipline, it’s not “how do we get the Department of ABC to use our services instead of some external team/doing it DIY and throwing grad students at the problem”, it’s “how does our team become the sort of team that is a no-brainer for ABC researchers to work with”.

The steps are simple to write down, even if hard to do:

  • Deeply understand what people need, and what’s standing in their way;
  • Develop demonstrable excellence in providing services that meet that need and sidestep their blockers;
  • Work with them to address that need.

With the VPR example, there never will be enough money for everything that could usefully be done. So even if our team is routinely knocking it out of the park, getting more money is an uphill battle. Money has to come from somewhere. The choice isn’t giving our team money versus not giving us money, the choice is giving our team money versus giving it to any of a number of other very worthy efforts — or worse, taking money away from one of those perfectly worthy effort to give it to ours.

If we want to make a case for our team being the best way to spend more funds, we need to deeply understand what the senior decision maker is wrestling with, the tradeoffs they’re facing, and the other options available to them, in their words. (And that means talking to them, as in #158; it’s not something we can attempting to discern alone in a room by pure reason).

Let’s say five FTEs worth of new ongoing funding landed in our senior decision maker’s laps tomorrow. If we can’t make convincing arguments, in words they would use and describing tradeoffs they’re currently wrestling with, for three other ways that would be better for them than spending the money on our own team, we’re just not ready to advocate to them for more funding yet. Because that from-the-sky lap money isn’t coming - they themselves will have to actively advocate for those additional funds, putting their own credibility on the line. They will need to show results.

Then our teams need to absolutely excel at the things we do that solve problems for the institution. That means making sure we have the highest possible research and scholarship impact per unit investment, using leverage and reproducibility (#157). We must be able to demonstrate, with testimonials and examples, that we’re already having outsized impact on research and scholarship, and that marginal unit of investment in our teams is being applied as effectively as possible. We need to keep talking with decision makers, clients, and other stakeholders to make sure we’re doing the most important, impactful work, and not just doing it well but getting better and better at it. We need to be collaborating with other groups to improve out comes and avoid duplication.

With that well underway, then we can start thinking about being able to demonstrate exactly what larger or new thing we’d be able to accomplish with additional spending (ideally at several levels of investment), and how that lines up exactly with what our decision makers are trying to accomplish. They need to be armed with convincing arguments that putting extra FTEs worth of funding towards our team, out of the many other options available to them, is the best possible thing they could do for the part of the institution that they are charged with stewarding. We need them to be able to explain why, upon second look, it turns out we’d be even better than those three other excellent options.

This is an enormous amount of work, and it takes sustained effort on timescales measured in semesters and years, not weeks or months. Frankly, I’ve failed to pull this off as often as I’ve succeeded. But this is what it takes to drive the largest and most important impact on research and scholarship in our institutions. In doing so, almost as a side effect, we “get” other departments on board with what we’re doing (by doing the things they need done), and “get” support for growing our team (by making growing our team a clear path towards better outcomes).

More seasoned managers and leaders - what other advice for new managers and leads would you offer to help them grow their scale, impact, and influence with other groups? What have you seen that has and hasn’t worked? Email me and I’ll share it - hit reply or email me at jonathan@researchcomputingteams.org.

And now, on to the roundup!

Managing Teams and Individuals

Earlier this week in Manager, Ph.D., I wrote about:

  • How women get “nicer” feedback, which holds their career development back
  • The dangers of team metrics (and tied that into the issues with surveys I talked about here last week)
  • The advantages of adding some discomfort into 1:1s
  • Being intentional about using in-person time to build team relationships

Technical and Project Leadership

Removing uncertainty: the tip of the iceberg - James Stanier
Start coding at the point of least certainty - Swizec Teller

The context of these blog posts are software development projects, but the fundamental principle applies to any large undertaking.

As a general rule, I’m a big fan of quick wins! Internally they build momentum and morale, and can be used to demonstrate progress and credibility externally.

But.

Some efforts have big question marks in the middle. Here be dragons. There is some step which is much less certain than the rest, and no known solution presents itself; or the boundary conditions for success are very vague. (If there if there is more than one area which poses such a question mark, the effort as a whole needs to be rethought and refactored).

Tackle the least certain part right away. That might be something to do with funding or administrative concerns, or a technical question, or a piece of domain science or workflow that no one on the team knows about, or… But putting that off to the end will lead to unnecessary stress on you, for the entire duration of the effort, and quite possibly the failure of the effort.

Teller talks about it from the point of view of a solo project; Stanier talks about it in the context of web services, building from the hardest (user-facing) point inwards, and how to organize a group’s efforts so that they meet up nicely at the seams.

Either way, part of our jobs as leaders and managers is to reduce chaos and uncertainty and replace it with clarity. This is one way we can do it - focus on the uncertainty first, and reduce it.

Do you have any stories to share that your team didn’t do this, and it went horribly wrong? Or times where you did tackle the big question right at the start, and quickly found out something that completely changed the effort? Let me know, I’d love to hear it and share with the readers - just email me at jonathan@researchcomputingteams.org.


Earlier this week in Manager, Ph.D., I wrote about how the most important part project management is communications, and some key communications skills one can practice to get ready for project management:

  • Drawing information out of people,
  • Verifying shared understanding,
  • Communicating your mental models clearly,
  • Assigning deliverables and monitoring them, and
  • Providing feedback.

Managing Your Own Career

How to Juggle Priorities: Decide Which Balls Are Glass and Which Are Plastic - Ashley Janssen

For all of us, there’s too much that we could usefully do to ever finish it all.

For many of us, there’s too many things we are currently doing to do them well or sustainably. The only solution for this case is to do fewer things, either by handing some of them off (in which case they won’t be done as well right away) or by simply declaring they’ll no longer be done. Either way, we must be unflinching about what the consequences will be.

This juggling analogy may already be familiar to most of you, but if not, it comes (as far as I know) from Nora Roberts:

“The key to juggling is to know that some of the balls you have in the air are made of plastic and some are made of glass. And if you drop a plastic ball, it bounces, no harm done. If you drop a glass ball, it shatters, so you have to know which balls are glass and which are plastic and prioritize catching the glass ones.”

Janssen counsels us to understand what of our current responsibilities, left undone, will break something that matters to us; what won’t, if left undone for a short period of time; and what won’t even if we never do it again.


Product Management and Working with Research Communities

User Research for Your Data Team, Part 2 - Danielle Kong and Caitlin Moorman, Locally Optimistic

We’re still not quite used to the idea, in our line of work, that we are internal B2B product and services organizations, vendors that serve research groups. Once we get comfortable with that notion, a lot of things become clearer.

Kong and Moorman’s article is aimed at data product teams, but can be applied quite widely. It introduces qualitative user research to quantitative data teams, encouraging us to:

  • Outline our objectives - what do we want to find out? (e.g. why are people not using our platform?)
  • Define some hypotheses
  • Choose a research method
  • Conduct our study
  • Synthesize the data, and make decisions of how to proceed.

For research methods, they outline the when and how of:

  • User interviews
  • Surveys (“Surveys are useful when you have a fairly large group of people already using a product and have clear areas of focus where you want to understand their sentiments”)
  • Usability testing, and
  • Focus groups

Research Software Development

Defining the roles of research software - Rob van Nieuwpoort and Daniel S. Katz

Here the authors summarize some discussion out of a workshop organized by ReSA and the Netherlands eScience centre, which van Nieuwpoort kicked off by giving a talk about the roles of research software. In the end, he and Katz collected an initial classification (with illustrations)! of the roles of research software.

In their estimation, research software might be

  • A component of our instruments (e.g. data analysis tools close to data acquisition)
  • The instrument itself (e.g. simulation codes)
  • A tool for analyzing research data (any of a number of data analysis or exploration tools)
  • A tool for presenting research results (visualization, as an example)
  • Something that integrates existing components into a working whole (pipeline tools, data assimilation)
  • A piece of infrastructure or underlying tool (libraries, foundational components)
  • A facilitator of distinctively research-oriented collaboration (platforms for code, data, or computation)

Research Software Engineers: Career Entry Points and Training Gaps - Ian A. Cosden, Kenton McHenry, Daniel S. Katz, arXiv:2210.04275

I’m seeing more writing about RSE career ladders (including this discussion by the UCL ARC team). The authors here (including Cosden, who we’ve spoken to about how Princeton RSE works) describe on-ramps to the RSE career path, and how each of those on ramps might leave some gaps in training that can be usefully addressed.

The authors identify two broad sets of skills needed by RSEs: knowledge of the research and possibly domain context, and software development skills. They also identify three key paths by which people enter the RSE career ladder:

  • Domain science (currently the majority path)
  • Computer science graduates
  • Industry non-research software developers

They point out that these aren’t exhaustive - recent masters graduates, professional research staff, data scientist/engineer, research computing facilitator, or research systems administrators also become RSEs.

Four gaps are identified:

  • Awareness of RSE as a career path - a big issue for everyone except those already in the digital research professional world
  • Software development skills - particularly for domain scientists, who often got little training in that area
  • Career prospects
  • “Cultural” issues with crossing disciplines - the larger scope of the role, greater ownership, and more importance of the research workflow and building on what has come before than in industry.

The authors propose


Research Computing Systems

How Long Do Hard Drives Last - Timothy Burlee, Secure Data Recovery
Most failed disk drives fail just before they hit 3 years’ use, says data recovery biz - Chris Mellor, Blocks & Files

Here’s an interesting counterpoint to the Backblaze stats we had last week in #159. Burlee works at Secure Data Recovery, a company who recovers data from dead disks or storage arrays. So the disks that Burlee’s firm sees are those disks that have failed, and that have data on them valuable enough to pay someone to try to recover.

SDR categorizes about 20% of failures as “predictable”, e.g. the disk was at risk of failure given the number of power-on hours and quantity of pending sectors. 80% were “unpredictable”, and might come from defects, power spikes, mishandling, etc. They don’t expect to see patterns in those drives, so their analysis focusses on disks they saw in 2022 which they’d classify as predictable.

Below is the plot of service hours and pending sector counts for the failed drives (the manufacturers are in order of number of disks they saw, with many more WD or Seagate drives than Maxtor).

Plot of power-on hours and pending sector counts of failed drives.

Clearly, things start getting dodgy around 20k+ power-on hours; the number of pending sectors at time of failure seem to vary significantly by manufacturer.

One interesting tidbit is that older drives seem more resilient, basically because we’re demanding more out of our drives now:

We found that the five most durable and resilient hard drives from each manufacturer were made before 2015. On the other hand, most of the least durable and resilient hard drives from each manufacturer were made after 2015. […] The relentless pursuit of performance results in difficult design decisions and tradeoffs. […] manufacturers must fit more components in the same physical space to achieve more storage space. The ultra-compact arrangement of read-and-write heads and platters within the case reduces allowance between moving parts, appearing to affect mechanical damage and wear resistance.

They find it difficult to say much about SSDs:

In the end, ignoring non-predictable failures, the most significant variable for SSD lifespan is the amount of data written to the device. Some users could exhaust their SSDs within a few years with intensive use. Other SSDs could last for more than five years.

Backblaze, which has a consistent usage pattern, can speak meaningfully about relative performance of SSDs between brands and compared to HDDs. The random disks brought into SDR don’t have the same consistency and so there’s less information that can be gleaned from their population of devices.


Emerging Technologies and Practices

DOE Wants a Hub and Spoke System of HPC Systems - Timothy Prickett Morgan, The Next Platform

The DOE has always tended towards the “few large supercomputers” model of computing, with the office of science’s ESNet allowing connection to these few systems.

But power consumption of the largest systems are tapping out what is feasible for publicly funded facilities; and the cost of moving enormous data volumes is making centralization more challenging.

So it will be interesting to see how this plays out with the DOE office of science’s latest call: the DOE

has started the bidding process for what it calls the High Performance Data Facility, which will be a centralized hub of shared datasets and other resources located in one of its HPC labs that connects out through spokes of network to storage and computation in the other centers. You can read the initial proposal here, which was announced on March 10.

Morgan provides more of the details here - they are seeking proposals for the hub for this hub-and-spokes (likely $300MM, possibly $500MM total) model.


There’s been several projects to “put compute next to the data” on storage nodes or on hybrid SSD or Flash devices. Tobias Mann also has an article in The Next Platform about how LANL is funding Seagate to experiment put some compute in hard drives so that, e.g. simple aggregations can be done before the data even leaves the enclosure.


Random

Matlab finally has a dark mode. (The Matlab website does not).

Disambiguating Arm, Arm ARM, Armv9, ARM9, ARM64, AArch64, A64, A78…

Very nice binder-based set of tutorials covering an introduction to finite elements.

Fit a physical-model based curve using symbolic optimization - PhySO.

Back in the history of ASCII, the vertical bar was broken **¦ , but that changed for… reasons?

People are starting to successfully run the sort-of-available LLaMA LLM models on surprisingly small systems, and with surprisingly sophisticated improvements (e.g. the Alpaca project) happening surprisingly quickly.

Debuggers are more sophisticcated than they were back in my day - what a good debugger can do.

Fast GPT-2 inference in 300 lines of Fortran.

A font IDE, and fast system font stacks across all major platforms.

Github markdown helpers in Issues, PRs, or Discussions.

The 1960s PLATO educational system, which arguably invented forums and multiplayer video games.


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.


Jobs Leading Research Computing Teams

This week’s new-listing highlights are below in the email edition; the full listing of 183 jobs is, as ever, available on the job board.