Blog Big Data News Digest for November, 2016

DataArt follows the news closely. And we are happy to share the most important Big Data updates with you.

Serverless Architecture in the Cloud

AWS Lamda, GCF, ACF, BlueMix OpenWhisk are all components used to build serverless architectures in the cloud. This is a new movement that allows developers to focus on developing their applications while using the environment as a service. Take a look at this introductory video to find out the basics of how serverless applications can be made.

Optimizing Cassandra Performance

Cassandra is widely considered to be one of the best ways to collect data from millions of sources. A Cassandra cluster is typically able to save data at a high speed while providing low read latency. These qualities are already available out of the box, but the guys at SignalFX managed to make Cassandra even faster. Take a look at what they were able to achieve by following this link, or watch a recording of their webinar here.

Make Hadoop Great Again!

Developers from Sweden sped up Hadoop. Given the complexity of locating bottlenecks in distributed systems, this was quite the feat. Even the speed of meta nodes had to be considered!

Debugging Apache Spark Code Faster

Developing and debugging distributed calculations has always been a difficult process. While some techniques described in this article might seem trivial to anyone with experience in big data (or data in general), this collection of tips is still an interesting read.

Developing Own Data Storage

The guys at Mail.ru share their experience of developing their own data storage. Special attention was paid to working with memory when creating a data snapshot. Share their pain here.

Detecting Financial Crime Patterns with Linkurious

An overview of the solution to a typical anti money-laundering problem, this time leveraging the power of graph databases. This is one of the most common issues in the world of modern financial organizations. 

Neo4J’s New Architecture

While Neo4J 3.1 is getting ready for release, find out more about the changes to its security model and distributed storage architecture from this video.

New Record with Apache Spark

Using GPUs in Spark, and a new record in the cost of data processing:

https://databricks.com/blog/2016/10/27/gpu-acceleration-in-databricks.html and https://databricks.com/blog/2016/11/14/setting-new-world-record-apache-spark.html.

Company behind RethinkDB is Shutting Down

After over seven years of development, the company behind RethinkDB is shutting down. Read more about the reasons here.

Projects:

https://github.com/tidwall/summitdb – in-memory NoSQL database with ACID transactions, Raft consensus, and a Redis-like API.

https://github.com/JoeriHermans/dist-keras – distributed deep learning with Keras and Apache Spark.

https://github.com/plum-umd/kvolve – an extension to the Redis database, to support the evolution of high-availability applications and their data online.

https://github.com/pingcap/tidb – a distributed NewSQL database compatible with the MySQL protocol.

DevOps Minute

https://www.jduv.me/devops/2016/10/19/ansible-stacks-2/ – Creating a Mongo cluster in AWS with Ansible

Clouds

https://cloud.google.com/free-trial/docs/map-aws-google-cloud-platform – Mapping AWS services to Google Cloud Platform products


Subscribe to our news


Leave a Reply