Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Building Big Data Analytics Application on AWS with NetApp Cloud Volumes

1 Nov 2018 2  
Let’s take a look at how NetApp Cloud Volumes can help us set up a Big Data analytics application on Amazon Web Services (AWS).

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

To derive real business value from enterprise data, you need the right tools and computing power to capture and organize large amounts of data of different types and sources. NetApp’s data management solutions make it possible to do this across a range of clouds. NetApp’s integrations make it easy to integrate and run enterprise data with other AWS services to suit specific enterprise needs in a scalable, automated and flexible manner.

To show how easy it is, let’s take a look at how NetApp Cloud Volumes can help us set up a Big Data analytics application on Amazon Web Services (AWS).

Data Analysis on AWS

Big Data is made up of high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and model creation. AWS provides easily scalable infrastructure on demand. It is a cost-effective means to conduct analysis on large datasets as well as build statistical models in real time. There are several use cases that could be explored—an example is Amazon Rekognition, a service that enables you to easily and quickly integrate computer vision features directly into your applications. A typical architecture would involve importing media files from a collection; the Lambda function triggered interacts with the Rekognition API based on set criteria.

Amazon Rekognition provides an API to which you submit images and/or videos. You then instruct the Rekognition service to perform a specific analysis of the media. The analysis can be anything from detecting faces within an image to extracting labels from a video in an asynchronous manner.

Big Data applications require a robust, secure, and scalable data management system. NetApp native cloud solutions redefine how enterprise data and analysis can be used in Big Data projects on public and hybrid cloud platforms. NetApp Cloud offers solutions in three different categories.

Data Volumes

NetApp Cloud Volumes Service for AWS is a fully managed service with support for Linux and Windows Elastic Container Service (ECS) instances. Users can perform Cloud Volumes tasks and run high workloads. The subscription process provides all of the initial setup and configurations that are required for using the service. Cloud Volumes Service supports multiprotocol NFSv3, NFSv4 and SMB volumes. Scaling from 1TB to 100TB is available to support application performance, with the flexibility to auto-grow and shrink as needed.

The most exciting feature is the native integration into AWS Marketplace/Console—meaning users do not have to add separate processes or experience disruptions when using NetApp Cloud Volumes Service.

NetApp provides three-tier pricing for access to Cloud Volumes Service. Each tier provides a balance between performance and capacity:

  • Standard provides 1,000 IOPS per TB (16k IO) and 16 MB/s of throughput per TB. List Price: $0.10 per GB per month (as of October 10, 2018).
  • Premium provides 4,000 IOPS per TB (16k IO) and 64 MB/s of throughput per TB. List Price: $0.20 per GB per month (as of October 10, 2018).
  • Extreme provides 8,000 IOPS per TB (16k IO) and 128 MB/s of throughput per TB. List Price: $0.30 per GB per month (as of October 10, 2018).

Compared to AWS EBS, Cloud Volumes at large scale is cheaper and combines with data service APIs, making data management in the cloud fully automated and scalable.

Data Integration

NetApp aims to reduce complex enterprise data management processes into simple SaaS applications and APIs to make integration, migration and synchronizations easier.

NetApp Cloud Sync

Cloud Sync powers cloud data migration between different platforms, servers, and formats. This software-as-a-service (SaaS) offering enables you to transfer and synchronize NAS data both to and from cloud or on-premises object storage. Cloud Sync supports any NFS or CIFS servers, Amazon or private S3 buckets, Azure Blob, IBM Cloud Object Storage, Amazon-EFS, etc.

Cloud Sync links your source server to the Cloud Sync Data Broker instance that runs in your AWS account or on-premises, and updates the target of your choice with the data from your source.

The Data Broker controls the sync relationships between your sources and targets. After you identify your source and target (S3 bucket) and select Create Relationship, Cloud Sync analyzes your source system and breaks it up into multiple replication streams to push to your selected target.

Cloud Sync also comes at a giveaway price and discounts. The hourly rate per sync relationship is tiered as follows:

  • First 1 to 5 relationships: $0.15 per relationship per hour
  • Next 6 to 20 relationships: $0.10 per relationship per hour
  • More than 20 relationships: $0.085 per relationship per hour

Example: If you establish seven sync relationships between NFS servers and Amazon S3 buckets, the first five sync relationships cost $0.15 per hour, but the remaining two will only cost $0.10 per hour.

Cloud Sync outperforms on-premises build tools in terms of setup, performance, costs, ease of use, and feature set. Cloud Sync was designed to move data from any source to any target. It supports all formats, and has built-in mechanisms for tracking and logging errors, for recovery, and for continuous sync schedules. Cloud Sync enables you to perform data migration, data transformation, and data synchronization in a fast, efficient, and secure way.

Key Features of Cloud Sync are:

Data Migration: Cloud data migration between different platforms, servers, and formats

Data Replication: Ensure archival information is stored properly in case you ever need it.

Data Synchronization: Rapid and secure data synchronization. Whether you need to transfer files between on-premises NFS or CIFS file shares, you can move files where you need them quickly and securely.

Benefits

  • Cloud Sync transfers data up to 10x faster than in-house developed or traditional tools.
  • Cloud Sync pricing is low and flexible, and based on hourly usage.
  • Cloud Sync tracks and does not re-replicate unchanged files.
  • Cloud Sync tracks and logs errors and failures, and can recover or pick up from where it stopped.
  • Cloud Sync is a service-based solution—Users don’t need to write and maintain scripts, interact with a cloud provider, schedule updates, track progress, validate each step, handle failure scenarios, and so on. Everything is built-in as part of the service.
  • Cloud Sync supports a friendly, intuitive web GUI where users can create relationships, change sync schedules, and monitor operations.

Data Protection

NetApp Data Backup provides an efficient safeguard for cloud data and restore in cases of data loss, no matter where your data is managed, accessed, or stored, and customers expect instant recovery. Enterprise data demands bullet-proof data protection and failsafe operations, and enterprise data management systems need the speed, agility, and reliability to deliver all of the above.

Cloud Volume Snapshot

Snapshot allows customers to set up snapshot images of a volume’s contents at a particular time. Users can use snapshots to both protect against and allow recovery from accidental or malicious loss or corruption of data. Cloud Volume instances allow up to 255 snapshot copies per volume instantly, creating online backups for user-driven recovery.

Snapshots are extremely flexible, letting users set the exact amount of snapshots required for the project. Snapshot policies include time, frequency, and images to be taken or kept. Snapshot images can later be restored or mounted on other Cloud Volumes.

Snapshot helps you:

  • Reduce the risk of data loss through automation.
  • Recover from anywhere.
  • Lower your costs versus traditional methods.
  • Extend your investment in existing backup software.

Conclusion

The NetApp Cloud Volumes Service for AWS not only manages applications for your Big Data needs, but also offers a complete automated management system, eliminating several layers of complexities, with support for major protocols, software and APIs. NetApp seeks to reduce complex data management processes and layers into simple APIs and software via integration with AWS’ compute-ready data analytics platform and tools, taking Big Data projects to a new dimension.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here