I need a bit of archecture advice. I have a java based webapp, with a JPA based ORM backed onto a mysql relational database. Now, as part of the application I have a batch job that compares thousands of database records with each other. This job has become too time consuming and needs to be parallelized. I'm looking at using mapreduce and hadoop in order to do this. However, I'm not too sure about how to integrate this into my current architecture. I think the easiest initial solution is to find a way to push data from mysql into hadoop jobs. I have done some initial research on this and found the following relevant information and possibilities:
1) https://issues.apache.org/jira/browse/HADOOP-2536 this gives an interesting overview of some inbuilt JDBC support 2) This article http://architects.dzone.com/articles/tools-moving-sql-database describes some third party tools to move data from mysql to hadoop.
To be honest I'm just starting out with learning about hbase and hadoop but I really don't know how to integrate this into my webapp.
Any advice is greatly appreciated. cheers, Brian
preguntado el 08 de enero de 11 a las 22:01
DataNucleus supports JPA persistence to HBase. Obviously JPA is designed for RDBMS so support for full JPA will never be possible, but you can do basic persistence/querying
Brian, In this case, you can either use HBase or Hive or just raw map-reduce jobs. 1. HBase is a column-oriented database. HBase best suits for a column based computations. For example, average employee salary(assuming salary is a column). And with it's powerful scalability feature, we can add nodes on the fly. 2. Hive is like traditional databases which supports SQL like queries. Internally queries will be converted into map-reduce problems. We can use this in case of row based computations. 3. Final option, where we can write our own map-reduce functionality. Using "sqoop", we can migrate data from relational databases to HDFS(Hadoop File System). Then we can write map-reduce problems that directly deal with underlying flat files. Mentioned some of the possible options. Let me know if you need additional details about above mentioned options.