Big Data: Friend or Foe?

As we move forward in a digital world, it is difficult to ignore the amount of data, frequently without consideration for the consumer or subsequent deliberation on the part of the consumer, large tech companies are compiling. While we have a tendency to distrust and malign big data and the organizations that harvest data, the obvious answer may be to stop using these services or technologies as a result may have wider-ranging implications which make an abrupt halt of these services less apparent. If used ethically, data has an important role to play in society. During the recent Covid pandemic, data and data sharing were critical to diagnosing the virus at a much earlier stage in the pandemic (1). Conversely, when companies do not adhere to legislation or subsequently employ practices that violate a common code of ethics, whether due to blatant disregard or because the development practices of the organization do not provide adequate observability and control (2), impending action should be required. The question is then how can we as technologists ensure that our systems are adequately and sustainably designed to allow for such control and elevated security, particularly as we embark into ever increasingly complex technological landscape.

As the industry looks to optimize infrastructure, one key way to do this is through reusing services, such as databases and logic across resources. With the advancement of 5G and the increase in the use of Internet of Things devices, the implications of sharing services are becoming increasingly complicated and require that security controls, that were once sufficient, such as using an IP address to root identity, are no longer adequately secure nor do they respond to the need for the sustainable deployment and development.

Identity for Machines, too

As technology evolves and more of our day-today life becomes digital, there is an increasing concern for the safeguard of digitally bound assets, such as bank accounts, that rely on human identity for access. For human authentication, we rely on strong authentication methods linked to what we know, what we have, or who we are, and in the case of multifactor authentication, a combination thereof (3). Frequently, this authentication will be federated across multiple systems, resulting in a Single Sign On (SSO).

The use of SSO is fairly well established within organizations and the benefits are understood. What is becoming increasingly apparent is that with the adoption of shared services and Service Oriented Architecture, that identity must also be translated to machines in order to ensure an adequate security posture and ultimately permit more control and sustainable development. It‘s easy to speculate why application identity federation hasn‘t become more prominent, whether it be the static nature of infrastructure, until more recently or the pervasive use of monolithic architectures or a combination; regardless, the current state of application identity leaves us with a fragmented, decentralized, highly manual approach.

While most cloud providers have native identities for their resources, unifying these into a single, auditable, controlled workflow without a centralized brokerage is complex, not human readable, and not scalable due to the complexity required for automation. By leveraging a brokerage mechanism, we can guarantee a unified workflow over all the required application layers, from network to application across multiple resources.

Why Does Machine Identity Matter?

Examining the history of identity controls and the associated security controls, the industry has heavily relied on IP addresses. In a world dominated by physical servers, monolithic applications, and predominantly physical controls, a trust model tied to a static IP address and inherited trust provided the required security assurance. As the adoption of cloud continues to increase, the controls that we previously relied on, are no longer adequate. This is due to a shift from a primarily physical infrastructure to a largely logic based infrastructure and the imminent fault domain associated with resources. As a result, the ability to implement identity and security controls in a scalable, secure manner is no longer tenable.

Looking at the principles surrounding the 2nd Industrial Revolution, the concepts can easily be adapted to the IT landscape, particularly adopting a factory system and division of labor (4). In the multi-hybrid cloud world, the difficulty appears that we are attempting to industrialize and homogenize highly customized platforms. If we want to transfer the benefits of industrialization to the digital world, we need to consider what that assembly line looks like on each CSP or Data Center level, and also how we industrialize those intersections. Just as the standardization through assembly lines permitted easier maintenance of machines due to homogenization and the ability to more easily exchange parts, the use of machine identity, when adequately brokered, provides a similar flexibility by ensuring that services are adequately federated and authenticated by mapping identities to the subsequent resources.

How Can Identity Scale?

All of the major Cloud Service Providers provide their own flavor of identity, networks have another set of identities, applications their own types of identities, and users, as well. Generally, these identities are bound by highly customized solutions on an application by application basis. As seen in industrialization, by creating a single workflow surrounding Identity, the possibility of engineering the intersection of these resources becomes highly automatable and repeatable, which in turn makes the process and the overall ecosystem more sustainable in the event of change, whether required or chosen, and more secure as all resources are mutually authenticated based on their identities.

By leveraging identity brokerage, we have more flexibility, secure authentication on all levels of our application stack, and better observability across systems. Because brokering allows a many-to-many relationship, the operational complexity is reduced as a result of managing fewer one-to-one relationships, resulting in added value due to more focus on declarative, relational authentication and authorization.

Should We Broaden Our Identity Scope?

While Gaia-X focuses largely on the human implication and controls of software engineering and data governance (5), the importance of the machine identity and the subsequent control required to adequately maintain those identities should not be overlooked. Given the complexity and volume of connected devices in an ever increasingly dynamic environment, the manner in which we control these machines, permit access becomes more critical. To fully secure our infrastructure, and as a result, ensure data security in a manner that we can force change across entire ecosystems, adequately managing machine identity and their access to resources is a fundamental step.

1. Moorthy V, Henao Restrepo AM, Preziosi MP, Swaminathan S. Data sharing for novel coronavirus (COVID-19). Bull World Health Organ. 2020 Mar 1;98(3):150. doi:10.2471/BLT.20.251561.PMID: 32132744; PMCID:PMC7047033.
2. Satariano, Adam. “Meta Fined $275 Million for Breaking E.U. Data Privacy Law.” Accessed December 9, 2022.
3. Cichonski, Paul, Thomas Millar, Tim Grance, and Karen Scarfone. “Computer Security Incident Handling Guide.” CSRC, August 6, 2012.
4. “Industrial Revolution Key Facts.” Encyclopædia Britannica. Encyclopædia Britannica, inc.
5. Identity Valley. “The Digital Responsibility Goals and Gaia-X,” February 2022.

Sarah Polan

As the Field CTO for EMEA, Sarah Polan joined HashiCorp from the Financial Services Industry where she most recently led a Secrets Management program with a focus on containerized workloads. She aims to elevate strategic conversation surrounding cloud adoption and improve the balance between technical enablement, velocity, and security.