Hi!
I’ve long tried to describe the sudden disorientation of losing immediate feedback that comes with being a first-time manager. Recently I heard the term “managerial sensory deprivation”, and I think that’s a useful phrase. When you’re hands-on doing the work of reearch computing and data, you get immediate, almost tactile feedback with what you’re doing, whether it’s from the systems or from the researchers. As a manager, any action you take has only indirect results that could take days, weeks, or months to really play out. And by that time, it’s hard to connect it unambigiously to something you’ve done.
Your work senses are getting no signal back, or weak signals seemingly unconnected to what you’re currently doing. You’re feeling a kind of sensory deprivation. And wikipedia helpfully reminds us that “extended or forced sensory deprivation can result in extreme anxiety, hallucinations, bizarre thoughts, temporary senselessness, and depression”.
This has come to mind lately being a new part of a big organization and, frankly, having no idea what I’m doing yet. I’m accomplishing stuff, but is it the right stuff? Is it having any kind of impact? If yes, is that impact positive? How and when will I know? Having moved organizations (and sectors!) I’ve again lost my bearings and my feedback mechanism. It’s apparently not an unusual situation; I heard the same thing from a reader who I talked with earlier this week, who’s also in a new job. (By the way, I love hearing from RCT readers - never hesitate to email, or even book a quick chat.)
Gentle reader, I would love to tell you that I’ve cracked this code, and there’s this one weird trick you can do to find out if your management interventions are working or if you’re doing the right things at a new job. I haven’t.
But there are two things you can do, which take time and effort and are very much imperfect.
The first is to try to attune your senses to the new environment. This is slow and messy and tiring. It means paying very close attention to what’s going on, taking notes of what’s currently happening, what’s changing (for better or worse), and taking notes so that you can track those changes over the longer timescales you have to adjust to. You can’t possibly track everything, so you have to identify what you find to be notable, and then choose some particular signals either you feel are important or that those around you seem to act as if are important and track those. Over time we can train our senses to respond to the new environment.
The second is active sensing - absent being able to passively accurately sense the new surroundings, to actively send out probes. Since flapping around and whooping ultrasonically like a bat probably won’t work for us, we’re reduced to asking questions of our colleagues and stakeholders.
Asking questions is less slow, but no less messy or tiring. People’s answers will be inconsistent, and in our line of work where things are very open-ended, they are unlikely to be willing or able to tell you “you should do X, Y, then Z” or “This thing you’re doing is working, this other thing isn’t”. There’s three approaches I like to take to handling asking these questions (and handling the answers) in this environment.
The first is that being new is basically a license to ask unlimited questions, of pretty much anyone you want. “Hi, sorry but I’m new here/a new manager here, can I ask you a queestion?” is basically impossible for someone of good will to say no to. And you can ask the most ridiculously basic-seeming questions - it’s amazing, and you should absolutely take advantage of this. After three or four months it may well feel weird for you to ask someone you wouldn’t normally interact with an off-the-wall question, but it is completely fair game, almost expected of you, for the first few months.
The second is that people love giving advice. They may hate giving feedback, but they love giving advice. An questions you can pose as a request for advice will give you long answers to chew over. “How would you recommend a new person do this ” or “How would you handle this situation…”
The third is that people are pretty good at identifying that a problem exists in some area, but we’re actually pretty rubbish at diagnosing it precisely, much less coming up with a solution on the spur of the moment. So if you start getting feedback that something isn’t working, or even directly told you should do X instead, take the input seriously but not necessarily literally. If you hear from more than one person that that there’s something wrong in the same broad area, then there’s almost certainly a problem there! But it might easily not be what they describe, and if they’ve suggested a solution it might be the right one but it’s at least worth thinking of alternatives. Take the input as data to be examined more closely and to direct further investigations.
Of course, the two approaches, passively taking notes and actively asking questions, work together. You can use the answers you get to train your passive sense for what’s going on, and the results of your noting and paying attention to inform your questions.
Are there other approaches you’ve found useful, when starting as a manager or just starting a new job? Or any suggestions for mentoring a new manager? Let us know — just hit reply or email jonathan@researchcomputingteams.org.
And now, on to the roundup!
The Cone Model for Teams’ Support Network - Shy Alter
As managers and leaders we develop, and are responsible for, a team, not a list of individuals. A team is a group of people that support each other and hold each other accountable, not just a set of people with similar email addresses who report up to the same manager.
And yet an awful lot of management training and writing focuses solely on interactions between the manager and each individual team member. That’s vitally important, and an excellent place to start. You have to get the basics of being a manager right before focussing on strengthening teams. When I’m talking to new managers I often bring up Google’s Project Oxygen, which focussed on basic manager skills, first - but they followed that up with Project Aristotle, which was about higher performing teams as a whole. They found five key determiners of high performing teams:
For almost all of these, the within-team dynamics play as big a role as the team member-manager dynamics.
Here the alter makes the point that even from a technical point of view, you can’t meet all your team needs, If you uncover needs for your team members in one-on-ones and then look for coaching and mentoring opportunities between team members, you’re now developing your team in two ways: building their coaching/mentoring skills and their technical skills.
The nice thing about this of course is as the team grows, the amount of time you can spend on each team member drops as 1/N, but the amount of time available for other team members to spend on each other stays almost constant as (N-1)/N…
And there’s no reason why peer-to-peer coaching, mentoring, and teaching has to be constrained to within one team. Have you seen (or do you have) good peer knowledge-sharing and coaching setups? Anything you want to share with other readers? Let us know!
Don’t Hire for Culture Fit - Ruchika Tulshyan, SHRM Executive Network
I agree with everything in this article; “Culture Fit” too often means “like us”. So when people don’t give any better reason than “culture fit” for not wanting to hire a candidate, it ends up being wildly exclusionary. It’s also counterproductive! When you’re hiring, you want to grow the team not just in numbers but in capability, and hiring more of the same doesn’t get you there. As Tulshyan says, culture add is a good and useful thing.
There are also a number of behaviours and skills particular to a team which are absolutely worth making explicit and turning into hiring criteria, and those often get put under “cultural fit”. Those are important, and worth keeping but maybe we need a more specific term. For instance, in a research computing team, it’s common (not universal, but common) that to succeed, a new hire will need to be much more comfortable with uncertainty and under-specified projects than would be common at the same level of seniority elsewhere. Any other behavioural or communication skills and behaviours that can be called out with enough specificity to be unambiguously evaluated are worth considering.
As with all potential requirements, you should filter by“would a person really not succeed here if they didn’t have this?” If it’s not something that would stand in the way of their or the team’s success, it oughtn’t be a criteria.
For those who need ammunition for their case to continue at least hybrid remote work policies in an academic institution, maybe a randomized case-control study will help - from Nick Bloom and Roubing Han at Stanford and James Liang at Fudan:
A large multinational randomized 3-2 hybrid WFH vs 5 days per week in the office for 1600 professional graduate employees for six months. They found three results: (1) 35% reductions in quit rates and 12% reduction in sick leave; (2) No impact on performance or promotions; (3) Employees shifted work from WFH days to evenings and weekend (“flexitime”). The results were so positive the firm immediately rolled-out hybrid WFH to all divisions.
Expect to see a lot of these studies as companies move back to work. I’m particularly interested in studies going the other way - comparing pure remote work to hybrid in-office. Let me know if you see one!
Interesting twitter thread by Owain Kenway at UC London about how they have 10(!!) open positions for “Research Infrastructure Developers” in the ARC group.
I’m starting to see lots of jobs along these lines, but I like how this group in particular framed these ones:
Research Infrastructure Developers are like Research Software Engineers but build, support and maintain infrastructure, preferably built with best practice (software defined, CI/CD etc) used by researchers such as HPC, storage and smaller, bespoke compute services. […] You might be an expert in Linux system administration, your passion might be in optimising deployed software packages/user experience on HPC, or data storage or Cloud or in working more like a researcher doing work on cutting edge technologies like smart NICs or Graphcore.
The focus on developer is useful. I’ve seen a lot of teams having trouble for hiring into jobs that sound too much like old-school sysadmin jobs or like they’re a human deployment tool (“deployment engineer”); and “devops engineer” just doesn’t make sense, a single person can’t be “devops”. And, aligned with my hobbyhorse, it suggests that having hard silo walls between software development and systems teams just doesn’t make sense.
(By the way, these jobs don’t show up on the job board this week. My rough-and-ready rule for whether something is a technical leadership or management position is whether in the job ad there are explicitly-called-out responsibilities for mentoring people, and/or leading or managing people, projects or products. In some cases this start happening at roles with “senior” in the title, at other places it’s not until “staff” or “principal”, or some other role name entirely.)
6 ways staff engineers help reach clarity - Alex Ewerlöf
Being at the Staff/Principal doesn’t mean knowing everything. Ewerlöf describes a number of other roles they can play in helping people find answers, with “knowing the answer” being probably the least valuable case:
Building an SRE Career Progression Framework - Ethan Motion
Whether it’s for research software, systems, data management, or data science, a lot of groups are trying to figure out formal or informal career progression pathways for individual contributors. As a manager, you can work with individuals in their one-on-ones to find out where they are interested in and ready to grow, and give them opportunities at that intersection. But how do you start thinking about career progression at the whole-team or multi-team level?
Motion describes a process of bringing together a piece of an organization to hash out a framework of levels on which to hang development pathways. Most importantly, he suggests defining the levels in broad behavioural terms:
I’ve seen threads.com come up in a couple of different conversations recently - is anyone using tools like that for discussions that you want to be less ephemeral than e.g. slack discussions? Text-based discussions leading up to decisions (as with architectural decision records) where the discussion is an important part of the documentation you want to maintain, and so things like Google Docs/Word comments aren’t right? Or do people use google docs or Git{Hub,Lab} with PRs for this sort of thing? Anything else for asynchronous meetings/discussions that you’ve found useful?
Twin Anxieties of the Engineer/Manager Pendulum - Charity Majors
As I’m decidedly back on the IC side of this particular pendulum, this has been on my mind a little bit.
Majors raises two anxieties she’s heard people have with this: “What if I never get another shot at people management”, and “am I too rusty to go back”. In both cases, her advice lines of strongly with my experience.
For the first one, you’ll have to have to actively fight off opportunities to go back into management. “Once a manager, marked for life as a manager” she says, and she’s 100% right.
For the second, she points out that you do get rusty, and if you went into management too early before you really had a change to build very strong technical skills, it could be hard:
Never, ever accept a managerial role until you are already solidly senior as an engineer. To me this means at least seven years or more writing and shipping code; definitely, absolutely no less than five.
But after that, it comes back, especially if you try to keep up a bit of fluency.
Most importantly, though, she talks about useful it is to have people with strong technical and people leadership skills:
If you’re a good manager it’s actually nearly impossible to hide that you have the skills, because of the way it infuses your work and everything that you do as an IC. You get better at prioritization, more attuned to the needs of the business, and restless about work that doesn’t materially move the business forward. You get better at asking questions about why things need to be done and at communicating with stakeholders. You get better at motivating the people you work with, understanding their motivations and your own, and mediating conflicts or putting a damper on drama between peers. People come to you for advice and may seem to just do what you say, or go where you point.
A couple genomic sequencing projects here this week:
A huge accomplishment spanning both wet-lab work and bioinformatics, the Telomere to Telomere consortium has a complete assembly of a human genome, the CHM13hTERT cell line (with chromosome Y from NIST HG002). There’s a special issue of Science covering the work. This work completes the hardest, left-undone 8% of the human genome that the Human Genome Project couldn’t do with the technology of the day.
Communication-Efficient Cluster Scalable Genomics Data Processing Using Apache Arrow Flight - Tanveer Ahmad, Chengxin Ma, Zaid Al-Ars, H. Peter Hofstee, bioRxiv
I think we’ll see more of this in the future - as simulation and analysis workflows get more complex, we’ll need to rethink how data is exchanged between pieces of the pipeline. The usual batch queuing system approach to this is to have each stage of the pipeline dump data to disk, to be reloaded by the next stage of the pipeline. This isn’t great! Here we see a framework written around fast streaming, bwa-mem, a bam-sort reimplementation, and Picard MarkDuplicate reimplementation. Data is transferred between stages using Apache Arrow, an in-memory columnar format, and Arrow Flight, a gRPC protocol for transferring the data. The glue code is python and the sorting and deduplicating is done in Pandas.
How I think about Code Management - Andreas Klinger
A lot of research software we start dealing with…., well, let’s say “has many opportunities to be made even better”. Klinger has a nice summary of maintaining and improving a code base over time. He sees it as having two components:
And that both of those can and should be addressed incrementally and continuously.
Klinger says that you handle the code complexity over time with refactoring (including my favourite refactoring, deleting code). You increase confidence by streamlining, automating as much as possible, documenting, and testing.
Both of these things are made a lot easier when there are clear expectations over new code, quantified (and automated, and enforced…!) wherever possible with linting and coding tools.
Interesting summary of a paper - most “single statement bugs” (in a collection of 318 found in 14 open-source Java projects) get fixed not because they broke a test, but typically after they’ve lurked in the code for a month or more, and sometimes right after some other bug broke the tests.
Another report argues that research software should be recognized as a research output, and I just don’t see it. Research software becomes successful exactly when it stops becoming principally an output and it starts to be an input to other research projects.
The argument is that it’s easier to get published and funding for working on recognized research outputs. Ok, cool, and I certainly agree that we need more funding for research software development. But among the biggest gotchas with academic research funding is (and has always been) that the incentives are to perform novel, not incremental, work. Since research software product maintenance and feature addition to existing tools is always necessarily incremental, I don’t see what problem this solves. And the conflict of interests that would come with having the software product team competing for funding with the groups of people that would use their software just seem really bad.
We’ve always had troubles providing reliable funding for shared, non-commercial, research inputs. I don’t know what the answer is, but I’m pretty sure treating them as outputs isn’t it. I’m not against it, I just don’t think it’s a solution to the problem.
Wachy (prounced “whacky”) is an open source tool and UI for low overhead performance tracing and debugging of arbitrary compiled binaries and functions using eBPF - this seems pretty cool.
A Simple Use Case for Generics in Go - Go Generics for Field-Level Database Encryption - Josh Wales, Kablamo Engineering Blog
One Stone, Three Birds: Finer-Grained Encryption @ Apache Parquet - Xinli Shang, Mohammad Islam, Pavi Subenderan, and Jianchun Xu, Uber Engineering Blog
Very fine-grained database encryption is something that’s in the air these days, and relevant for a number of sensitive research data use cases. Wales’ article describes using go structures and go generics to process record-specific encryption for records, and is more about how nice it is to be able to use Go generics for this to be able to encrypt various kinds of data.
The article by Shang et al, on the other hand, takes advantage of a particular advantage of parquet files which supports modular encryption which still supports columnar projection, predicate pushdown, and compression, and using different keys for different columns. It also allows authentication of the data via signatures.
They implement an architecture for taking advantage of this, and find only about a 3% overhead with Java 11.
This looks like great course material: DSCI 310, “Reproducible and trustworthy workflows for data science” by Tiffany Timbers at UBC. Course notes are available, as are packages in R and python.
Use the Index, Luke! is a resource I’m surprised I haven’t seen before - “a guide to database performance for developers”, covering relational database performance across vendors but with specific tops for Postgres, MySQL, and maybe less relevantly for us, SQL Server, Oracle, and DB2.
A quick overview of some low-code Jupyter notebook tools for data exploration and manipulation - bamboolib, lux, and mito.
SpringShell Brings Hell to Java Developers - Steven J. Vaughan-Nichols, The New Stack
This is just a mess, and may well affect some tools used at some research computing centres, since Spring is used in a huge fraction of new web-facing Java-based services. Everything needs to be upgraded to 5.3.18+ or 5.2.20+ of the spring framework.
Oh yeah and there’s an arbitrary file write and execution vulnerability in the gzip and xzutils tools(?!?). Is zgrep available on your systems? Update to gzip 1.12.
How to properly interpret a traceroute or mtr - Phil Lavin
Nice set of recommendations from Levin:
Memory Performance of AMD EPYC Rome and Intel Cascade Lake SP Server Processors - Markus Velten, Robert Schöne, Thomas Ilsche, Daniel Hackenberg
This is a really nice and detailed discussion of the memory architecture of Rome and Cascade Lake systems, and performance results. They’re both one generation behind the current new processors, but (a) a lot of these systems are out there now and will be for several years, and (b) the architectural overviews and comparisons will be useful for another generation or two yet. Very worth reading if you want to get the most of these architectures in the coming years.
The Looming ARM Server Battle between AWS and Microsoft - Timothy Prickett Morgan, The Next Platform
Microsoft Azure now has its own publicly-available ARM instances, Epsv5, based on the same processor family that Oracle Cloud has had success with, the Ampere Altra.
(Speaking of, if I haven’t mentioned this before, you can get 4 ARM cores to play with for free, indefinitely, at Oracle Cloud, which is cool and handy).
Morgan goes into the history and details, and digs into the claim by ARM and Azure that these nodes have 50% higher perf/performance on SPEC Int than Azure’s Milan or Ice lake nodes, which in the end he largely he broadly agrees with (I think?).
I’m not really sure this is a “server battle” between Azure & AWS - they both just now have ARM offerings, aimed at slightly different workloads. So far, to my eyes the Graviton3 has/will have advantages for research computing workloads over the Ampere Altra A1. It’s cool to see ARM getting increasingly mainstream, though - more options is good.
AMD: Pensando gives us better-than-AWS networking tech to rule the cloud - Dylan Martin, The Register
AMD makes a big DPU move with $1.9 Billion Bid for Pensando - Jeffrey Burt, The Next Platform
NVIDIA offers DPUs, and AMD is a competitor anyway, so I won’t offer too much commentary on this, but these two articles are good on the why of DPUs - why they’re so much in the air now and their usefulness. (You know people are interested in a technology when their starts being arguments around what does and doesn’t count). Martin makes explicit what I haven’t seen much of elsewhere - the extent to which AWS’ Nitro silicon to make bare metal systems available in a cloud was a big factor in shaping industry thinking. Burt covers the details of Pensando’s offerings.
With multi-tenancy pretty fundamental to research computing, and as we increase our support for projects working on a variety of sensitive data, this sort of infrastructure is going to be increasingly important. And that’s not even considering the performance wins of pushing tasks down into the network hardware. DPUs are going to provide some interesting possibilities, regardless of who makes them!
Fujitsu Cloud Service to put Fugaku Supercomputer In Reach - Nicole Hemsoth, The Next Platform
Fujitsu Launches ‘Fujitsu Computing as a Service,’ Leveraging Fugaku Supercomputer - HPC Wire
One of the big changes of the past few years is that commercial companies are increasingly offering a range of super-specialized offerings aimed squarely at the researcher market.
For researchers who need compute, they have choices like never before, and the companies are happy to help them choose. The sales teams have at their disposal whole libraries of materials explaining why their offering, the OptoRompter 2000, is a perfect for match for the researcher’s well-known work in the field of socio-romptish-optodynamics, complete with benchmarks and testimonials from people the researcher has heard of and quite possibly met at conferences.
Here Hemsoth gets into a new hyper-specialized offering, Fujitsu offering renting of A64X systems, the building blocks of the Fugaku supercomputer with sits at the #1 spot of the Top500. RIKEN Director Satoshi Matsuoka is involved, and apparently Fugaku already has supported 48 industrial use cases.
These systems are very much not for everyone, but Fugaku has developed a reputation for working unusually well on existing real-world HPC codes for a #1 system, rather than requiring enormous rewrites. So the already-growing cloud HPC marketplace has another credible vendor.
Researcher dumped some crummy code on you? It could always be worse - The (business-critical) project with a single 11,000-line code file. Oh yeah, and it was VBScript.
That feeling when you try to write a game, then start making a game engine, then accidentally write a language in which to implement the game engine.
That feeling when adding static to a variable makes a routine 10x faster.
See, this is why wordle 286 ended my winning streak. Wordle is NP-hard. Even knowing the right minimum number of guesses is NP-hard. It wasn’t that I got “OUT” too quickly and flailed too long to come up with SNOUT.
Set minus euo pipefail, at the top of every shell script, my friends: PIPEFAIL - how a missing shell option slowed Cloudflare down.
Use directory-specific environment variables with direnv.
Oh sure, Jupyter is cool and all, but wouldn’t it be cool to have notebooks that support a prolog-derived language? Meet Percival, the notebook for datalog.
Hacking the linux kernel in Ada.
How SCRAM, part of the Simple Authentication and Security Layer (SASL) protocol that e.g. postgres and mongo use, works to securely authenticate a connection without TLS.
Use gh, fzf, and jq to find PRs that modify the file you’re currently working on.
Lesser-known capabilities of python f-strings.
The ins and outs of the trailing slash in posix shells. I can’t tell you how much the inconsistencies here drive me nuts.
ACM has opened the articles published over its first 50 years (1951-2000).
Sometimes, “off-like-a-bandaid” is the way to go in big software changes. 10 years later, the “/usr merge” is still slowly happening in Debian while Fedora just took the plunge, accepted the heartaches with a big breaking change, and did it in Fedora 17 in 2012.
If you do have merged /usr, you can do things like run a development environment in a container with /usr mounted.
Edge computing getting extreme - Azure working with several companies on edge-based data and computing infrastructure in space.
OpenSSH 8.9 now supports quantum-resistant key exchange (and removes the legacy and kind of brittle approach to scp quoting, if you start seeing scripts break).
rand() sometimes calls malloc()?
In the market for a multi-process 16 core 83 MHz Z80 laptop running CP/NOS implemented in an FPGA? Welcome to the Zedripper.
Something a little more modern? Ok, a full-featured classic 68K Mac in your browser. MacOS8.app.
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.