Hi, everyone:
I hope that as the start of the academic year approaches in the northern hemisphere, those of you who are going back to campus are comfortable doing so, and supported by administration.
Thanks to community members volunteering, we have a small core of 4-5 people who will start helping with resources for the community making suggestions to guide the newsletter. If you have suggestions, or want to take part, just hit reply or email me.
And now, on to the roundup:
The 3 Stay Conversations: The Best Way To Improve Employee Retention - Know Your Team, Clare Lew
The Final One-on-One: 7 Questions to ask When Offboarding Employees - Anita Chauhan, Hypercontext
After 18 months of pandemic, with everyone tired while tech industry employee demand and salaries are skyrocketing, incautious research computing team managers risk losing their most ambitious, high-performing, team members.
In the first article, Lew lists three sets of conversations to routinely have with team members during one-on-ones:
These seem like great detailed questions to ask periodically. It might usefully be augmented by very frequent high-level open-ended check-ins. For instance, with manager-tools employee retention question - “Hey, overall, how’s it going?” - asked routinely in one-on-ones, as a way of detecting whether your team members have any issues or concerns which might lead them to having their shields down to other opportunities.
If people do leave, we should celebrate that - growth and development and moving to new roles is good and healthy - even if it’s a pain for us. In addition, a team member leaving can be a good time to get some unvarnished feedback if you’ve spent the time together building a good working relationship. Chauhan recommends some questions, including:
Performance Review: Build Your Process and Master Feedback Delivery - Gábor Zöld and Lara Hogan, Coding Sans
In my institution it’s annual review time - it’s a pretty lightweight process here. It might actually be too lightweight - a manager who isn’t routinely giving their team members direct feedback could easily skate through this review process with some platitudes and go an entire year without ever giving their team members any useful information about how they are doing. On the other hand, massive annual processes cause their own pain, if only through non-compliance.
In a field as quickly-shifting as research computing, setting goals annually requires unfeasible powers of prognostication. I like setting and reviewing goals for work, skill development, and career development every 3-4 months - our template form is here. New managers: you are not as limited by official HR processes as you likely believe! Your HR business associate would be delighted to hear that you’re setting goals and reviewing progress with your team members more often than the bare minimum enforced by policy.
Regardless - however you and your institution organize performance review processes, it’s important for the team for expectations setting, an important lever and responsibility for you as a manager, and a stressful time for your team members. In this blog post coming out of a podcast episode, Hogan gives a detailed walk through of her approach to the process; it’s a terrific discussion. She covers:
With a bonus of how to prepare to receive feedback when its your turn.
Hogan makes an important point early on - none of your feedback given in a performance review should ever be a surprise. You’re their manager - you can, and have the responsibility to, give them timely feedback as needed; the purpose of the review process is to synthesize and memorialize that feedback and make plans together as to next steps. She also emphasizes connecting the feedback to things they care about, such as career development goals.
There’s a lot of good material here, and it’s worth going through if you’ll be giving a performance review any time soon - or even if you’re not but you’d like to up your performance expectations conversation skills.
Demystifying Public Speaking - Lara Hogan
A lot of us who came up through research tend to no longer focussing on giving good presentations, because after all - we gave a million presentations about our work. We’re old hands.
But the presentations we’re asked to give leading research computing teams are very different than the extremely specific sub-genre that is academic presentations, which are mostly teaching people who are already experts about some new work we’ve recently done. We’re usually making cases for particular actions, or giving status reports, or training, all the while speaking to a very broad audience. Also - look, the quality/interestingness bar set for academic presentations is pretty dire.
Hogan’s book is a simple overview of giving more general-purpose presentations. Several sections of this book - choosing a topic and a venue, getting over pre-talk jitters, what to pack and bring on talk day - will in fact be old hat to veterans of the academic colloquium circuit. But sections 4 and 5 on crafting the structure of the talk, arranging post-talk resources, practicing and getting feedback before, and taking notes afterwards of what worked and what didn’t, are useful, especially for people who haven’t thought critically about this since grad school.
Check Your Work, Ask for Help, and Slow Down - Michael Lopp
Making decisions as a manager is scary, and it gets harder as your responsibilities grow. In this very sympathetic article for decision-makers, Lopp has some recommendations for you once you think you have a decision:
Patterns in confusing explanations - Julia Evans
We do a lot of training and other explaining to broad communities - perhaps in presentations, as above! - and it’s important to have clear explanations.
Evans, who writes extremely clear explanations on a range of technical topics, often about things she’s recently learned herself, has seen a list of common anti-patterns in explanations she’s read on technical topics, and lists them here with some examples and better ways. I’ve paraphrased the list for stand-alone summary purposes:
but if you’re at all interested in improving the explanations you and your team share in anything from documentation to presentations to downtime announcements to anything else, this is a good starting point.
GitHub Discussions is out of beta - Evi Liu, GitHub Blog
GitHub Discussions is now officially generally available, and provides an entire forum - with Q&A and open discussions, pin announcements, enable moderation, grant higher permissions to frequent contributors, create polls, for your users, and more. You can convert issues into discussion topics, too.
Have any readers started doing this for their projects? It seems a lot easier than setting up a slack or something for community feedback, and a lot easier for a potential new user to participate in than filing an issue.
20 Questions a Software Engineer Should Ask When Joining a New Team - Thomas Stringer
This is written as advice to candidates to ask their hiring managers in interviews, but they are also a good set of questions for research software development teams should ask themselves and have ready as good answers (or unsolicited information) for incoming candidates before they start hiring.
Stringer’s questions cover working with the code, working with the team, working with the users, and the product(s) as a whole:
A candidate who is given all of this information at once, before hiring or as part of onboarding material, would have a lot of confidence that they’re joining a team that has their stuff together.
The big-load anti-pattern - Daniel Lemire
The obvious and easy thing to do when processing a lot of data is to load it all in memory. Sometimes this is the only feasible approach unless you want to do a lot of work, as Lemire points out it scales poorly even if the machine you’re using has the necessary memory - with the simplest possible example, Lemire shows a slowdown of 3-4x going from 1MB to 1GB, and 1GB isn’t a tonne of data. All of the caching infrastructure of your system, from disk to RAM to CPU, works better and faster with processing small chunks of data at a time.
Lemire recommends:
Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel - John Russell, HPCWire
HPC hasn’t been alone since at least 2009 in trying to manage single computing systems that are datacenter-sized.
“Smart NICs” offloading some computation - like MPI collective aggregations - to the network cards have been available for some time. But there are increasingly stringent needs. This article provides a nice quotable fact via Intel - Google and Facebook claim that anywhere from 22% to 80% of CPU usage from many micro service workloads go to infrastructure handling, like network security (IPSec, TLS), networking control (traffic management and shaping), etc.
Hyperscalers like AWS and Azure have been pushing these functions to their own hand-rolled network cards developed with ASICs or FPGAs, but demand is growing enough that NIC vendors are looking to turn these into products. It’s not straightforward - handling these tasks at line speeds while speeds increase requires significantly more compute power, and programmability. The result is what everyone else is calling DPUs, and intel is calling IPUs because it used DPUs for something else.
The IEEE Hot Chips conference had a session on these units from ARM/Marvell, NVIDIA, and Intel, and Russel here helpfully summarizes and compares the plans.
Interestingly, all the DPUs are ARM based, even Intel’s. As is required for application offload, they’re all quite programmable. The aims are impressive - 400-800 Gbps ethernet, with switching, security and QoS at line speeds, direct memory access with ROCEv2, and more.
This is a good overview of products coming down the line and if you’re interested the summary and links to other materials is a good starting point.
Eats Safety Team On-Call Overview - Eduardo David and Josh Kline, Uber Engineering Blog
An important requirement for reliable systems is a well-defined set of responsibilities for being on-call outside of regular working hours. It doesn’t necessarily have to be onerous, depending on what the reliability goals are, but if you want to avoid people stressing out over things that don’t matter, or being too lax over things that do, or not knowing what is whose job, there has to be something explicitly written down.
This post by David and Kline is a nice, short, and unusually specific writeup of how Uber Eats handles on-call:
What I found most interesting is the hand-off meeting, and on-call quality metrics. The goal is to make on-call as reasonable as possible, and that means not getting paged for stuff that can wait until morning, or that the paged person can’t do anything about. It also means having high-quality runbacks so if there are things that need to be fixed, it’s clear to the on call team what needs to be done. So they collect data from the on-call team about exactly these items, so they can improve things.
(Interestingly - they are using this blogpost in part as a recruiting tool. “Look how reasonable our on-call is, and how we work to improve it. Hey, we’re hiring”. There’s a lesson in there for our teams.)
Playing with a Quantum Computer - Rainer Müller, Franziska Greinert, arXiv 2108.06271
Even 15 years ago, this would be a pretty fantastical sentence: The authors of this paper show how to use freely-available real quantum computers as teaching labs for undergraduate physics courses.
IBM Quantum and Quantum Inspire have free tiers, and other cloud providers have free tiers which can be used for their quantum simulators which allow one to develop against the APIs that are used for their quantum computing resources.
There are a number of classic quantum algorithms - Shor’s algorithm for factorization which is complicated, or the artificial but simple Deutsch–Jozsa algorithm determining if an algorithm returns binary constants or 50/50 ones and zeros. Müller and Greinert, for their purposes, want something that is both relatively straightforward to program in a gate-baed quantum computer but also interesting and easy to reason about for undergrads.
The “Coin Flip” game is a not-very-interesting game where two players have a penny in (say) a heads-up state, each determine their move (flip or not-flip) a penny, and player 2 wins if it ends up heads else player one wins. In David A Meyer’s Quantum Coin Flip game, the coin is a quantum bit in the | 0> state, classical player 1 is limited to the classical flip or no-flip moves, and quantum player 2 can “move” by applying an arbitrary unitary transform to the qubit. In this game, the quantum player has a move that will always win. (“Unlike Bob, Alice knows the laws of quantum physics and is able to perform all conceivable quantum operations on the qubit.”) |
Programming (here in IBM’s quantum composer) the winning strategy is relatively straightforward, it helps demonstrate both unitary transformations and measurement by projecting onto the 0/1 states, and (helpfully, pedagogically) it demonstrates a state of affairs that’s completely impossible for the classical version of the game, where there is no winning strategy.
We often talk about research computing as a way of simulating experiments of physical systems - this is the first time I believe we’ve described a situation where the research computing environment actually is a physical undergraduate laboratory of the actual system.
More Proof Points for Low Precision HPC - Nicole Hemsoth, Next Platform
High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations
Lower-precision calculations in HPC aren’t new - multi-precision methods come up often in the literature of the late 90s and early 2000s, when the use of smaller floats for parts of the calculation started looking attractive as memory bandwidth started to play a larger role in technical computation.
But those methods are more complicated and subtle than just using double for everything, and until AI started really driving hardware efficiency advances (and awareness) on FP32, FP16, and BF16 computations, most felt it wasn’t worth it.
The article by Hemsoth highlights recent work done by the European Centre for Medium-Range Weather Forecasts (ECMWF) and others on using lower-precision calculations in climate and weather calculations. Both the fully-fledged weather forecast simulations (showing forecast tracks of a hurricane) and shallow-water calculations show significant performance increases with lower-precision and compensation, and perfectly acceptable accuracy:
The article by Lim and Nagarakatte outlines the work of creating accurate and highly performant rounded math libraries for lower-precision floats.
For those intereted in learning more, here’s a great survey paper on numerical methods using mixed precision from 2020.
IEEE Cluster 2021, 7-10 Sept, Virtual, $65-$140
Very broad conference on cluster computing, which includes workshops like the interesting-looking Workshop on Re-envisioning Extreme-Scale I/O for Emerging Hybrid HPC Workloads (REX-IO ’21)
RustConf ’21 - 14 Sept, Virtual, $106, $7.32 for students or unemployed
Rust is becoming popular for the sort of things research software teams would typically write in C++; sessions on fuzzing and optimization (with the case of a machine learning algorithm) may be of interest.
[New Zealand Research Software Engineering Conference 2021](https://www.rseconference.nz/programme-tabs1/#tabs | 1) - 15-17 Sept, Online, $40 NZD |
Covers a wide range of applications and methods for research software development in the New Zealand context, with meetups and workshops associated with the conference.
Training Virtualization - 23 Sept, 3pm-4:15pm EDT, Online, Free registration.
From the website:
Many organizations abruptly transitioned from a primarily on-site to a primarily remote work experience last spring. However, organizations still have training needs that were once largely accomplished through in-person events such as workshops, hackathons, and tutorials. This panel will share what they learned during the past year in their efforts to bring more virtualization to what historically has worked for in-person training events. What worked well? What did not work? This panel will share their insights about lessons learned over the past year and how those experiences will inform plans moving forward when organizations can safely offer in-person training again.
An opinionated guide to xargs, an essential shell tool for handling large numbers of files or items.
This week in git I learned: You know how git suggests the “right” version of mispellings? Apparently if you like to live dangerously you can have it automatically run the suggestion. You can also have conditional blocks in your gitconfig.
Part one of a story of how ispc, a compiler for Larrabee, came about.
Nice story from the Facebook team of how they built ZippyDB, a distributed sharded key-value store built atop RocksDB.
Datalog is having a moment. Datalog in Haskell, Datalog in bash (along with a datalog→bash translator) and Datalog in python.
Sure, playstation 5s look cool, but how about an even cooler gaming platform - DOS games, with 19 games released so far in the 2020s(!)
Building a Commodore PET with components available today.
Ooh, GitHub is finally improving their issues/project boards - by a lot, it looks like.
Making the case that in 2021, unix tools should be outputting not plain text but JSON - and providing a tool to help. Which might make this introduction to jq handy.
PlsExplain, a simple programming language with first-class comments, and where every value is associated with a history of comments.
Huh - apparently a number of terminal emulators (iTerm2, Gnome terminal, …) have support for control codes which make URLs or link anchor text clickable. And if you want to know about accessibility with colours in the terminal, it turns out that the terminal game community (e.g. roguelikes like Nethack) have put a lot of thought into it.
An argument (correct, I think), that SQL vs NoSQL isn’t a helpful distinction, and instead strict vs loose schema validation is the split to make.
A free book on SAT-solvers (and SMT solvers) by example.
Function-as-a-service (AWS Lambda) for bursty monte carlo computations.
A deep dive into API authX tokens.