(Note: All times are in UTC)
Summary: Changes in the ScaleFT Server Agent (sftd) in version 1.0.2 revealed an existing bug in the ScaleFT platform. The bug was in the parsing of “SSH Known Host Keys”. When a user attempted to access a Linux server running sftd version 1.0.2 and if that server had an ed25519 host key, attempts to issue credentials (SSH Certificates) would fail with an HTTP 500 status code. Many common Linux distributions including Ubuntu, Debian, Amazon Linux include support for ed25519 keys in their latest versions.
Impact: From Saturday, May 21 at 03:41 until Monday, May 23 at 08:08 some Linux server builds received sftd version 1.0.2, and all attempts to access them using ScaleFT received an HTTP 500 error code. At 08:08 as mitigation for new server builds, version 1.0.2 was removed from the Linux package repositories. On Monday, May 23, at 09:10 a platform release was deployed to app.scaleft.com to fully resolve the issue.
Metrics: * Time to Detection: 2 days, 3 hours, 47 minutes. * Time to Acknowledgement: 8 minutes * Time to Partial Mitigation: 40 minutes * Time to Full Mitigation: 1 hours, 42 minutes
Technical Details: On startup and as part of server enrollment, the ScaleFT Server Agent (sftd) on Linux systems reads the
/etc/ssh/sshd_config file. It scans for the
HostKey directive, and attempts to load all referenced files. The software uses a 3rd party library,
golang.org/x/crypto/ssh, for parsing the referenced files. If sftd encounters an error reading any one of the files, it skips that file and continues processing the others. This behavior was documented as being desired because OpenSSH is known to add new host key types, and our agent may not know about all possible host keys.
Once a list of host keys is parsed, they are submitted to the Platform as part of the server enrollment in the “device info” data structure. The Platform only validates a subset of the device info on submission, storing some attributes in indexed database fields, while some are stored opaquely for later use.
When the Platform receives a valid and authorized request for SSH Credentials from the ScaleFT Client, it deserializes the target server’s device info in order to assemble an SSH known hosts list for the client. The client uses the SSH known hosts list to prevent man in the middle attacks.
The bug was that when iterating the list of SSH host keys provided by sftd, the Platform would error with an HTTP 500 if it was unable to parse any of the SSH host keys.
The error situation was caused by changes in sftd between version 1.0.1 and 1.0.2, specifically the 3rd party library, golang.org/x/crypto/ssh, was upgraded. This newer version included changes to support ed25519 host keys. When sftd 1.0.1 encountered an ed25519 host key, it would skip it, not including it in the device info submission. When sftd 1.0.2 encountered an ed25519 host key, it would successfully parse it, and include it in device info. The platform used the same golang.org/x/crypto/ssh library, but before Platform version 0.20.14 it used an older version without ed25519 support. When the Platform encountered a Host Key it could not understand, it would fail all attempts to generate the SSH Known Hosts, and return an HTTP status code 500 for attempts to get credentials.
The difference in behavior, of sftd being lenient on parsing errors and the Platform being strict obfuscated this issue, until sftd added support for ed25519 host keys in version 1.0.2. Additionally, both sftd and the Platform made assumptions about the parsability of SSH host keys, based on a library that they both use.
The upgraded dependency between sftd between version 1.0.1 and 1.0.2 was not noted in release notes, because it was not viewed as a user-visible change in behavior or bug fix.
The deployed mitigation in 0.20.14 changes the Platform’s logic when processing potential SSH host keys, skipping any it cannot parse, logging if parsing failed, instead of returning an error.
Areas of Improvement: There are several preventative steps that will be taken to prevent future issues, improve the time to detection and resolution, and communicate clearly about the issue.