Fundamentals of storage in the cloud age

Despite the many changes in data storage over the decades, some fundamentals remain. One of them is that storage can be accessed by one of three methods: block, file, and object.

This article will define and expand on the features of these three products, while examining the on-premises and cloud products you’ll typically find that use file, block, and object storage.

What we’re seeing is that while on-premise hardware form factor block, file, and object storage products (usually) are available, these types of storage access are also offered in the cloud. to serve the workloads that need it.

The rise of the cloud has also led to hybrid (data center and cloud) and distributed forms of file and object storage.

So while files, objects, and blocks have been long-standing fundamentals of storage, the way they are deployed in the cloud age is changing.

File and block: whole and part

The file system has always been a mainstay of storage technology. Block and file access storage offers two ways to interact with the file system.

File access storage is accessing entire files through the file system. This is usually done via network-attached storage (NAS) or a linked grid of scalable NAS nodes. These products come with their own built-in file system and the storage is presented to applications and users in drive letter format.

In bulk access, the storage product – typically deployed on-premises in a storage area network (SAN) systems, for example – only deals with blocks of storage in files, databases, etc. In other words, the file system that applications communicate resides higher in the stack.

File systems offer all sorts of advantages. One of the big ones is that this is how most enterprise apps are written – and it’s not going away any time soon.

A key feature of filesystem-based methods is that there are methods – such as those found in the Posix command set – to lock files to ensure that they cannot be simultaneously overwritten, du least not in such a way as to corrupt the file or processes. around.

File storage accesses entire files, so it is used for general file storage, as well as more specialized workloads that require file access, such as in media and entertainment. And, in its scalable NAS form, it’s a mainstay of large-scale repositories for analytics and high-performance computing (HPC) workloads.

Block storage allows applications to access the blocks that make up files. This can be database access where many users are working on the same file simultaneously and possibly from the same application – email, enterprise applications such as enterprise resource planning (ERP), for example – but with locking at the subfile level.

Block storage has the great advantage of high performance and not having to deal with metadata and file system information etc.

File and block: cloud and distributed

File storage still exists in standalone NAS format, especially at the entry level, and scalable NAS, intended for on-premises deployment, is common.

But the advent of the cloud, and its tendency to globalize operations, has had a twofold effect on things.

On the one hand, a number of vendors offer global file systems that combine a distributed file system on public cloud and local network hardware, with all data in a single namespace. Vendors here include Ctera, Nasuni, Panzura, Hammerspace, and Peer Software.

In contrast, all of the major cloud providers — Amazon Web Services, Google Cloud Platform, and Microsoft Azure — offer their own file access storage services, as well as NetApp’s, in the case of AWS. IBM also offers file storage through its cloud offering.

Block in the cloud

Some storage providers, such as IBM and Pure, offer instances of their block storage in the cloud. And the big three all offer cloud block storage services, aimed at applications that require the lowest latency, such as database caching and analytics, as well as virtual machine (VM) work. ).

Probably due to the nature of block storage and its performance requirements, no distributed block storage seems to have emerged as it did with files.

Object storage: a world apart

Object storage is based on a “flat” structure with access to objects via unique identifiers, similar to domain name system (DNS) method of accessing websites.

For this reason, object storage is quite different from the tree-like hierarchical file system structure, and this can be an advantage when data sets grow very large. Some NAS systems feel the pressure when accessing billions of files.

Object storage accesses data at the equivalent file level, but without file locking, and often multiple users can access the object at the same time. Object storage is not strongly consistent. In other words, it is ultimately consistent between the mirrored copies that exist.

Most legacy applications are not written for object storage. But far from necessarily being a drawback, historically speaking, object storage is actually the storage access method of choice in the cloud age. Indeed, the cloud is generally much more of a stateless proposition than the legacy enterprise environment, and also likely includes most of the storage offered by the big cloud providers.

Additionally, objects in object storage offer a richer set of metadata than in a traditional file system. This makes data in object storage also well suited for analysis.

Object in the cloud – and on premise with file

The cloud has been the natural home of object storage. Most storage services offered by cloud providers are based on object storage, and it is here that new de facto standards, such as S3, have emerged.

With its easy access to data that fortunately can exist largely stateless and ultimately consistent, the object is cloud-era mass storage.

You can obtain object storage for on-premises deployment, such as Dell EMC’s Elastic Cloud Storage, which is only intended for deployment in a data center. Meanwhile, Hitachi Vantara’s Hitachi Content Platform, IBM’s Cloud Object Storage, and NetApp’s StorageGrid can operate in hybrid and multicloud scenarios.

Some vendors specializing in object storage, such as Cloudian and Scality, offer on-premises and hybrid deployments.

And in the case of Scality, with Pure Storage (and NetApp, to some extent), converged file and object storage is possible, the reason here being that customers increasingly want to access large amounts of unstructured data that may be in file or object storage formats.