2022 Open Source Summit – Day 2

The word for Day 2 of the Open Source Summit is SBOM.

When I first heard the term my thought was that someone had spoken a particular profanity at an inappropriate time, but SBOM in this context means “Software Bill of Materials”. Open source is so prevalent these days that it is probably included in a lot of the software you use and you may not be aware of it, so when an issue is discovered such as Log4shell it can be hard to determine what software is affected. The idea of asking all vendors (both software-only and software running on devices) to provide an SBOM is a first step to being able to audit this software.

It isn’t as easy as you might think. The OpenNMS project I was involved with used over a hundred different open source libraries. I know because I once did a license audit to make sure everything being used had compatible licenses. I also have used Black Duck Software (now Synopsys) to generate a list of included software, and it looks like they now offer SBOM support as well, but I get ahead of myself.

Note that Synopsys is here in the Sponsor Showcase but when I stopped by the booth no one was there.

Getting back to the conference, the second morning keynotes were more sparsely attended than yesterday, but the room was far from empty. The opening remarks were given by Mike Dolan, SVP and GM of Projects at the Linux Foundation, and he was a last minute replacement for Jim Zemlin, who was not feeling well.

Picture of Mike Dolan on stage

Included in the usual housekeeping announcements was a short “in memoriam” for Shubhra Kar, the Linux Foundation CTO who passed away unexpectedly this year.

Dolan also mentioned that the Software Package Data eXchange (SPDX) open standard used for creating SBOMs had turned 10 years old (and it looks like it will hit 11 in August). This was relevant because with applications of any complexity including hundreds if not thousands of open source software projects, there had to be some formal way of listing them for analysis in an SBOM, and most default to SPDX.

The next speaker was Hilary Carter who is in charge of research for the Linux Foundation.

Picture of Mike Dolan and Hilary Carter on stage

She spoke on the work the Linux Foundation is doing to measure the worldwide impact of open source. As part of that she mentioned that there is a huge demand for open source talent in the market place, but there are also policy barriers for employees of many companies to contribute to open source. She also brought up SBOMs as a way to determine how widespread open source use is in modern applications.

Stylized Mercator Map Projection

Since diversity has been a theme at this conference I wanted to address a pet peeve of mine. This is a slide from Carter’s presentation and it uses a stylized Mercator projection to show the world. I just think it is about time we stop using this projection, as the continent highlighted, Africa, is actually much, much larger in proportion to the other continents than is shown on this map. As an alternative I would suggest the Gall-Peters projection.

Gall-Peters projection of the world yoinked from Wikipedia

To further digress, I asked my friend Ben to run “stylized Gall-Peters projection” through Midjourney but I didn’t feel comfortable posting any of the results (grin).

Anyway, enough of that. The next presenter was Kevin Jakel, who founded Unified Patents.

Picture of Kevin Jakel on stage

The goal of Unified Patents is to protect open source from patent trolls. Patent trolls are usually “non-practicing entities” who own a lot of patents but exist to extract revenue from companies they believe are infringing upon them versus building products. Quite frequently it is cheaper to settle than pursue legal action against these entities and this just encourages more actions on the part of the trolls.

The strategy to combat this is described as “Detect, Disrupt and Deter”. For a troll, the most desired patents are ones that are broad, as this means more companies can be pursued. However, overly broad patents are also subject to review, and if the Patent and Trademark Office is convinced a patent isn’t specific enough it can invalidate it, destroying the revenue stream for the patent troll.

I’m on the fence over software patents in general. I mean, let’s say a company could create a piece of software that exactly modeled the human body and how a particular drug would interact with it, I think that deserves some protection. But I don’t think that anyone owns the idea of, say, “swipe left to unlock”. Also it seems like software rights could be protected by copyright, but then again IANAL (one source for more information on this is Patent Absurdity)

Picture of Amir Montezary on stage

The next person on stage was Amir Montazery, of the Open Source Technology Improvement Fund. The mission of the OSTIF is to help secure open source software. They do this through both audits and fundraising to provide the resources to open source projects to make sure their software is secure as possible.

Jennings Aske, of New York-Presbyterian Hospital spoke next. I have worked a bit with technology in healthcare and as he pointed out there are a lot of network connected devices used in medicine today, from the devices that dispense drugs to the hospital beds themselves. Many of those do not have robust security (and note that these are proprietary devices). Since a hack or other breach could literally be a life and death situation, steps are being taken to mitigate this.

Picture of Jennings Aske on stage

I enjoyed this talk mainly because it was from the point of view of a consumer of software. As customers are what drive software revenues, they stand the best chance in getting vendors to provide SBOMs, along with government entities such as the National Telecommunications and Information Administration (NTIA). The NTIA has launched an effort called Software Component Transparency to help with this, and Jennings introduced a project his organization sponsors called DaggerBoard that is designed to scan SBOMs to look for vulnerabilities.

Picture of Arun Gupta on stage

The next keynote was from Arun Gupta of Intel. His talk focused on building stronger communities and how Intel was working to build healthy, open ecosystems. He pointed out that open source is based largely on trust, which is an idea I’ve promoted since I got involved in FOSS. Trust is something that can’t be bought and must be earned, and it is cool to see large companies like Intel working toward it.

Picture of Melissa Smolensky on stage

The final presenter was Melissa Smolensky from Gitlab who based her presentation around a “love letter to open source”. It was cute. I too have a strong emotional connection to my involvement in free and open source software that I don’t get anywhere else in my professional life, at least to the same degree.

I did get to spend some time near the AWS booth today, and after chatting at length with the FreeRTOS folks I happened to be nearby when Chris Short did a presentation on GitOps.

Chris Short presenting GitOps

In much the same way that Apple inspired a whole generation of Internet-focused products to put an “i” in front of their name, DevOps has spawned all kinds of “Ops” such as AIOps and MLOps and now GitOps. The idea of DevOps was built around creating processes to more closely tie software development to software operation and deployment, and key to this was configuration management software such as Puppet and Ansible. Instead of having to manage configuration files per instance, one could store them centrally and use agents to deploy them into the environment. This central repository allows for a high degree of control and versioning.

It is hard to think of a better tool for versioning than git, and thus GitOps was born. Software developed using GitOps is controlled by configuration files (usually in YAML) and using git to make changes.

While I am not an expert on GitOps by any means, suppose your application used a configuration file to determine the various clusters to create. To generate a new cluster you would just edit the file in your local copy of the repo, git commit and git push.

You application would then use something like Flux (not to be confused with the Flux query language from InfluxData) to note that a change has occurred and then do a git pull which would then cause the change to be applied.

Pretty cool, huh? A lot of people are familiar with git so it makes the DevOps learning curve a lot less steep. It also allows for the configuration of multiple repositories so you can control, say, access to secrets differently than the main application configuration.

Spot Callaway and Brian Proffitt

Also while I was in the booth I got this picture of two Titans of Open Source, Spot Callaway and Brian Proffitt. Oh yeah.

My final session of the day was given by Kelly O’Malley of Databricks on Delta Lake.

Kelly O'Malley presenting on Delta Lake

Now as someone who has given a lot of talks, I try to be respectful of the presenter and with the exception of the occasional picture and taking notes I try to stay off my phone. I apologized to her afterward as I was spending a lot of time looking up terms with which I was unfamiliar, such as “ACID” and “parquet“.

Delta Lake is an open source project to create a “Lakehouse”. The term is derived from a combination of “Data Warehouse” and “Data Lake“.

Data warehouses have been around for a very long time (in one of my first jobs I worked for a VAR that built hardware solutions for storing large data warehouses) and the idea was to bring together large amounts of operational data into one place so that “business intelligence” (BI) could be applied to help make decisions concerning the particular organization. Typically this data has been very structured, such as numeric or text data.

But people started figuring out that a lot of data, such as images, needed to be stored in more of a raw format. This form of raw data didn’t lend itself well to the usual BI analysis techniques.

Enter Delta Lake. Based on Apache Spark, it attempts to make data lakes more manageable and to make them as useful as data warehouses. I’m eager to find the time to learn more about this. When I was at OpenNMS we did a proof of concept about using Apache Spark to perform anomaly detection and it worked really well, so I think it is perfectly matched to make data lakes more useful.

My day ended at an internal event sponsored by Nithya Ruff, who in addition to being the chairperson of the Linux Foundation is also the head of the AWS OSPO. I made a number of new friends (and also got to meet Amir Montazery from the morning keynotes in person) but ended up calling it an early night because I was just beat. Eager to be fresh for the next day of the conference.