Happy Valentine’s Day!
We’re roughly at the middle of February - thanks for following along with the Research Computing Teams newsletter! Next week things will start in earnest - I’ll start having additional emails through the week, along with the weekly link roundups.
The regular shorter emails will generally follow a theme for the week, with the link roundups remaining a grab bag. Likely themes so far will follow below; they’ll likely tend to alternate between the people side and the technical side of our profession.
As when the link roundups were getting started, I’ll be leaning heavily on your feedback to guide me on what’s of interest and when I’m missing the mark; hit reply to any email and let me know what you think.
Working with Remote Laboratory Teams - Bernard B. Tulsi, Lab Manager
One area where I think research computing teams have a lot to teach other communities - especially with remote work being increasingly common - is how to have effective teams that are split between multiple sites (and even multiple institutions). Here, speaking with managers from the Multidisciplinary Drifting Observatory for the Study of Arctic Climate expedition (where there is a truly remote site - an Arctic icebreaker!), the Joint Centre for Energy Storage Research, and the Charles River Laboratory the author highlights the advantages of such work - sometimes part of the team just needs to be somewhere else like where the data is being collected, or the advantage of being able to access talent and expertise from around the world. Some of the techniques those managers of dispersed teams highlighted as being important were:
It’s important to point out that the issues underlying all of these suggestions - strong communication, trust, and shared motivation - are an issue for all teams, and that distributed teams don’t add anything new exactly; but routinely bumping into people in the halls or at the coffee machine can sand down the edges of a lot of issues which instead require conscious and deliberate attention in distributed groups.
Building trust — with people and software - Tal Joffe
Tal uses technical analogies to management-of-humans concepts. I’m not sure that’s necessarily a good idea! but the basic concepts are important enough that it’s worth trying a lot of different approaches to reach different audiences. Here, he addresses one of the key goals of one-on-ones; building trust with direct reports. The analogy is between say testing your alignment and agreement on a number of topics in one-on-ones with running tests on code bases; in both cases, the trust people have grows as the inputs are frequent and consistent, which is easier if they are small and lightweight. It is much better to have weekly 30min one-on-ones with people on your team than to have quarterly 4-hour meetings - in terms of building trust and finding out if something is wrong quickly.
Maybe a more successful use of that sort of analogy is an earlier article of his on Microfeedbacks - breaking the monolith. Here the comparison is between monolithic code and modular or micro services code on the technical side, and on the people side, yearly or quarterly performance evaluations versus lots of small frequent feedback. Again, lots of small inputs is much more valuable - and easier! And less stressful! - than waiting three or twelve months and having a Big Deal conversation.
Related to the micro feedbacks post - this isn’t new but it was recently pointed out to me - a very short but effective set of google slides on the importance of small frequent feedback, Some Ad-Hoc thoughts about PIPs by Roy Rapoport who writes well on these topics. I wish there was a similar example for positive feedback, which is at least as and arguably more important than negative. A more serious responsibility of a team leader than catching mistakes on any particular task is helping your team members excel and grow. One important way of doing that is, when someone does something well, giving immediate positive feedback with enough detail that they can do it well again. People learn do get really good at things through guided practice; it’s your job do provide some of that guiding through positive feedback.
Why are we so bad at software Engineering?- Jake Voytko
Why Managing Data Scientists is Different - Roger Stein, Sloan Management Review
This coding legend knows the secret to fixing Big Tech’s most pervasive problem, Jumana Abu-Ghazaleh, Fast Company
The spectacular failure of process (and, to some minor extent, software development - issues converting data formats!) that occurred in the Iowa Democratic Caucus spurred a lot of anguished conversation about why software development is such a trash fire in many sectors. Jake Voytko’s article I think nails it - the dominant market for software developers, commercial web applications, sets the culture; and there if we’re being quite honest, quality doesn’t really matter much. Higher quality is better, but things are changing so quickly and the costs for errors are so modest as to drive incentives towards “move fast, break things”.
It’s slowly becoming more apparent that not all sectors are like that. Certainly in highly regulated industries like health or finance that’s always been true, but in our own world of research, bad quality code or services can actively hinder the progress of science in our institution, or much worse, end up with publications that have to be retracted.
Roger Stein writes in the context of data science - which is one particular sub-sub-discipline of research computing - that managing data scientists is different than web development (say), because in “Iron triangle” of projects one usually makes tradeoffs between - time, cost, and quality - Quality is both crucial and hard to measure. In a data science team in a corporation, or a research computing team in a University or research institute, the quality of the outputs is the entire point of the team; stuff that produces wrong answers is just not ok.
Jumana Abu-Ghazaleh talks about Margaret Hamilton, the developer who made Apollo 11 possible; Hamilton coined the term Software Engineering because she was trying to move software development to the same level of rigour and accountability as civil or mechanical engineering. We may not need those levels of rigour - no one will die if our code collapses - but in research computing we need standards that are a lot closer to standards in those fields than at Facebook.
Custom Github Issue Labels - Karolis Koncevičius
Karolis suggests several different kinds of labels; categories for the type of issue, priorities, tags, and status. Do you have favourite labels you assign to project repos?
Github CLI in Beta - Billy Griffin, GitHub
One reason I haven’t been using labels consistently in github is that it’s a pain to add labels, milestones etc consistently across repos. I’ve tried a few tools and can’t say as I love any of them. But with this GitHub CLI in beta there may be soon be a standard way to be able to script such things so that I could add them on repo creation? One can only hope.
6 Tips on How to Say No To Customers - Sharon Moorhouse, Intercom
We work closely with researchers, and that can make it hard to say no to a feature request. This article walks through the process, which is normally pretty routine but can run the risk of leaving hurt feelings. The tips most relevant to us are:
Of course, to say no (or even worse, yes!) to a change request from a user can really get you in trouble if you don’t have a clear picture of where the product is going, what its purpose is, and what is in and out of scope. Mathais Meyer writes about drafting such a strategy when coming in to a team for the first time; there are also approaches for more rigorously deciding what decisions to take with data and experiments. Whether it comes from those approaches or from an original product vision, saying yes and no to suggestions is a lot easier when you know where you’re going.
There’s been a two good particularly good AWS tutorials from two different angles this week.
AWS HPC Workshop - Pierre-Yves Aquilanti, Anh Tran, Linda Hedges, AWS
This is a nice introduction to playing with AWS from the point of view of an experienced HPC hand - it goes from the familiar (ssh’ing into a remote machine, setting up a cluster - but one that scales - setting up a performant parallel POSIX file system) to the less familiar (AWS Batch and containers). So it’s a nice transition.
On the other hand, that can sort of reinforce the HPC “way of doing things”; a more eye-opening approach might be to start dabbling with the cloud for the sort of tools that migrated first. J Cole Morrison has started an “Understanding Modern Cloud Architecture” series of which the first three are published; it’s less hands on but walks the reader through building a modern web application in a more idiomatically “cloud-native” way.
HPC on OpenStack
the good, the bad and the ugly - Ümit Seren
The FOSDEM 2020 talks are online now and there’s a lot of really nice work presented. In the HPC, Big Data, and and Data Science track, this good- and bad-news overview of setting up multiple HPC infrastructures on an on-prem OpenStack deployment to take advantage of the reconfigurability between environments. It’s a great talk, and highlights the downsides (the huge complexity of OpenStack) and the upsides (the configurability).
That talk brings up the issue of testing Infrastructure-as-a-Code, but the talk Infrastructure testing, it’s a real thing! by the aptly-named Paul Stack goes into that issue in more detail. How do you do CI/CD when your “deploys” are infrastructure?
Persistent L2ARC might be coming to ZFS on Linux - Jim Salter, Ars Technica
Persistent memory like the Intel Optane, after some false starts, starting to make inroads; right now, naturally enough, it’s being adopted where there’s low-hanging fruit (like in this case, persisting ZFS’s read cache layer so that there’s no long warm-up time between reboots). Which, you know, fine. But with emerging vendor-agnostic programming models for these devices, I’m waiting increasingly impatiently for interesting external memory algorithm/out-of-core computation applications in research computing; data structures and methods where the persistence is less important than the fact that there’s a new large, byte-accessible layer in the storage hierarchy sitting between RAM and disk - (and potentially addressable remotely via RDMA!). Out-of-core for research computation (as opposed to databases) efforts got more or less killed off in the 1990s/2000s by cheap RAM; is there a reason why it’s not picking up again? Is block-access on SSDs just good enough? Do people just scale-out now? Or is there work out there that I’m missing?
Creating coastlines using data science - Bernadette Sterry, UK Hydrographic office
This is called “data science” because Government comms, but this is a really nice example of how to do real-world work you need not just computing skills and data science but deep domain expertise - e.g., real research computing. Typically coastline work is done by marine offices using quite labour intensive cartography and surveying; but with climate change, increasing erosion, and the like, coastlines are moving at nontrivial rates.
Here a deep analysis of optical satellite data using remote sensing approaches and looking at things like the refractive index of water gives a prototype of a completely computational coastline chart of the UK.
Building cloud-based data services to enable earth-science workflows across HPC centres - John Hanley, ECMWF
Also from both the UK and FOSDEM, this is a really nice overview of a very sophisticated solution to making archival simulation data from the outputs of the European Centre for Medium-Range Weather Forecasts available in a cloud environment for querying and reanalysis. Groups like ECMWF run with operational requirements that would keep most of us awake at nights in panic-induced sweats - it turns out that governments, companies, navies etc. really care about having their weather forecasts correct and timely - and so while they have to push 100TB daily around the globe for their users, they also want to make the most of all that compute and data. So there is cloud environment with those outputs (that you can play with! https://pypi.org/project/ecmwf-api-client/ https://apps.ecmwf.int/datasets/) for secondary analysis and pilot projects. A really ambitious and interesting project.
Want to see how the NSA teaches Python programming? It’s disappointingly mundane.
I remember being surprised by what one could do at the command line with cut
and join
; it turns out xsv
will let you do simple SQL operations like join on CSV files as John Cook reminds us.
Today I learned that Intel’s Clear Linux apparently performs really well even on underpowered AMD hardware, not that I’ve ever been much of a Linux on the laptop type.
I don’t see a lot of multi-stage builds used in research computing containers, which is a shame given how complex many of the builds are, leaving lots of components kicking around which aren’t needed at runtime. Kashyap Kondamudi gives a nice overview of the process here (with potential concurrency as a bonus!)