Microsoft promoted the Roadmap about the subject “Big Data” lately at the SQLPass. The Apache Projekt Hadoop will be a main part of this program.
Hadoop is a Framework or a System which includes different components. The aim is to conduct and analyze huge (and also unsorted) files.
The project includes these subprojects:
· Hadoop Common: The common utilities that support the other Hadoop subprojects.
· Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
· Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.
Other Hadoop-related projects at Apache include:
· Avro™: A data serialization system.
· Cassandra™: A scalable multi-master database with no single points of failure.
· Chukwa™: A data collection system for managing large distributed systems.
· HBase™: A scalable, distributed database that supports structured data storage for large tables.
· Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
· Mahout™: A Scalable machine learning and data mining library.
· Pig™: A high-level data-flow language and execution framework for parallel computation.
· ZooKeeper™: A high-performance coordination service for distributed applications.
Hadoop is developed with Java and his home is the world of Linux because of this I was surprised to hear this announcement.
Hadoop & Windows Azure/Server
According to the announcement Hadoop should be able to run on a Windows Server and it should be integrated into Windows Azure. The first Beta is planned to be published at the end of the year. Afterwards follows the going live in the next year.
What means huge files? Who use this?
Probably Facebook has the main Hadoop Cluster – in this Blogpost you will found some numbers and facts. Impressive.
A little thing that makes me, as a developer, laugh (and which maybe drives a lot of DBA’s crazy):
There should be connectors for the SQL server which are meant to manage the communication between the worlds of NoSQL and SQL. Even Excel and Co. should be able to use the new opportunities. More technical details you will find here.
Even if I’m not really affected by this news I think it is smart from Microsoft to look beyond their own nose. At least it has been the right decision in the past.