Mahout 0.7 no pudo obtener la recomendación con una gran cantidad de datos usando MysqlJdbcDataModel

I am using Mahout to build an Item-based Cf recommendation engine. I create an MahoutHelper class which has a constructor:

    public MahoutHelper(String serverName, String user, String password,
        String DatabaseName, String tableName) {


    source = new MysqlConnectionPoolDataSource();

    source.setServerName(serverName);
    source.setUser(user);
    source.setPassword(password);
    source.setDatabaseName(DatabaseName);
    source.setCachePreparedStatements(true);
    source.setCachePrepStmts(true);
    source.setCacheResultSetMetadata(true);
    source.setAlwaysSendSetIsolation(true);
    source.setElideSetAutoCommits(true);
    DBmodel = new MySQLJDBCDataModel(source, tableName, "userId", "itemId",
            "value", null);

    similarity = new TanimotoCoefficientSimilarity(DBmodel);

}

and the recommend method is:

   public List<RecommendedItem> recommendation() throws TasteException {

    Recommender recommender = null;
    recommender = new GenericItemBasedRecommender(DBmodel, similarity);
    List<RecommendedItem> recommendations = null;
    recommendations = recommender.recommend(userId, maxNum);
    System.out.println("query completed");
    return recommendations;
}

It's using datasource to build datamodel but the problem is that when mysql has only a few data (less than 100) the program works fine for me, while when the scale turns to be over 1,000,000, the program stacks at doing recommendation and never goes forward. I have no idea how it happens. By the way I used the same data to build a FileDataModel with a .dat file, and it takes only 2~3 second to complete analysis. I am confused.

preguntado el 27 de agosto de 12 a las 04:08

1 Respuestas

Using the database directly will only work for tiny data sets, like maybe a hundred thousand data points. Beyond that the overhead of such data-intensive applications will never run quickly; a query takes thousands of SQL queries or more.

Instead you must load and re-load into memory. You can still pull from the database; look at ReloadFromJDBCDataModel como envoltorio.

Respondido 27 ago 12, 11:08

Muchísimas gracias. Eso ayudo. - Omer Sonmez

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.