Big Data

What is Big Data?

Data sets are growing at a staggering pace, expected to grow by 100% every year for at least the next 5 years. Most of this data is unstructured or semi-structured – generated by servers, network devices, social media, and distributed sensors. “Big Data” refers to such data because the volume (petabytes and exabytes), the type (semi- and unstructured, distributed), and the speed of growth (exponential) make the traditional data storage and analytics tools insufficient and cost-prohibitive. An entirely new set of processing and analytic systems are required for Big Data, with Apache Hadoop being one example of a Big Data processing system that has gained significant popularity and acceptance.

According to a recent McKinsey Big Data report, Big Data can provide up to $300 billion annual value to the US Healthcare industry, and can increase US retail operating margins by up to 60%. It’s no surprise that Big Data analytics is quickly becoming a critical priority for large enterprises across all verticals.

Big Data Drives Big Traffic

The availability of cheap compute and storage as well as highly effective Big Data analytics capabilities are encouraging enterprises to collect, transmit, and store more data than ever before. This is resulting in significantly higher WAN traffic as more data needs to be moved back and forth between storage systems, and in many cases replicated.  Additionally, Big Data tends to predominantly be distributed across large geographies. Bringing together Big Data in centralized locations for efficient processing on Hadoop or other Big Data systems is another factor driving Big Traffic over the WAN. The results of Hadoop processing are then sent to different data centers for integration with traditional data analytics tools for further processing, resulting in even more WAN traffic. Finally, as Hadoop becomes mainstream and enterprises start using HDFS as a storage tier, replication of this data across clusters for disaster recovery will also result in higher traffic.

Big Traffic is driving significant growth in inter-data center bandwidth requirements for enterprises. 10Gbps of WAN connectivity is just the starting point, with many large enterprises already deploying up to 100Gbps of WAN connectivity between their data centers. Adding bandwidth is expensive, and does not guarantee higher transfer rates due to inherent limitations of TCP. Clearly, WAN optimization needs to scale up to meet these demands and prevent adverse effects of Big Data on the WAN.

A Networking Approach to Big Traffic

The traditional, x86-based, software WAN optimization solutions cannot scale to 10s of Gbps in a single device while guaranteeing low port-to-port latencies. Traditional WAN optimization solutions can inject up to 20ms port-to-port latency, making their use to accelerate inter-data center traffic impractical.

Infineta’s Data Mobility Switch (DMS) is the first and only WAN optimization solution designed to accelerate Big Data traffic traversing multi-Gigabit WANs. Much like data center switches and routers, Infineta DMS performs packet processing and acceleration in specialized hardware (based on merchant silicon). This approach allows the DMS to support10Gbps of throughput on a single device while introducing no more than 50 microseconds of port-to-port latency on average (2-3 orders of magnitude lower than the traditional WAN optimization solutions).

Infineta DMS Benefits for Big Data Deployments

  • Seamlessly accelerate latency sensitive traffic over Data Center Interconnects at speeds of 10Gbps and higher
  • Remove the inter-data center bandwidth bottleneck by increasing effective capacity by up to 5x or higher
  • Extract the full potential of your Big Data deployments by transferring more data, more quickly, while utilizing less WAN bandwidth
Content