Hola a todos
I have a PHP website that should use some cached data (stored in Memcache, for example). The data should be stored in cache by daemons fetching it from web services and some of it should be stored in MySQL server too.
The daemons should do the following:
- Fetch foreign exchange rates, parse them and store them in database as well as in two seperated memcaches in seperate machines.
- Fetch financial indices and store it in seperated memcaches.
- Fetch large XML data and store it in two seperated memcaches.
I am capable of writing these daemons in C/C++/Perl/PHP/Python.
I have to decide in which language/script I should choose in order to implement these daemons. The advantage of using PHP for this is that I can use API used by the website application itself. Another advantage is that PHP is easy and everyone knows it so I won't be tied up to maintaining these daemons but on the other hand PHP is slower and consumes much more resources.
The main disadvantage of using other language than PHP is that it's harder to maintain code written in C/C++/Perl. Nowadays, I guess it's not common to do these kind of tasks using C/C++/Perl. Am I wrong in saying that ?
What would you recommend me to do in this case ?
preguntado el 08 de enero de 11 a las 20:01
Perl and Python are default answers for writing such scripts. But it doesn't matter (much) what language you use if you write good code. The more importat thing is that how you handle your script on failure.
In the long run you may see your scripts are failing seldom for arbitrary reasons, and it may not worth for you to debug the script because it usually does a fair job and it would be difficult to find where it went wrong.
I have few perl scripts doing the same kind of thing that you are doing. to me the tricky part was to make sure that my scripts don't fail for long because I didn't want to miss a chunck of live streamed data.
And for that I used Monit . A great tool.
The best choice would probably be PHP for simplicity/code reuse.
From what I can tell it's just passing data around, it's no performance to worry about. And about resource usage just make sure not to run out of max_memory (by means of streaming maybe or configure plenty). Abort and log operations that take too long. Reconnect to the database in a loop when SQL operation fail etc.
NOTE OF CAUTION
Daemon programming is tricky and a lot of things can go wrong. Take into considerations all points of failure.
Also, note that Perl is a lot more versed in regards to daemons than PHP. I left out c/c++ as performance (pass data around) is not an issue and daemon programming is hard enough as it it, why add worries on memory leaks, segfaults etc. ?
The best practice is to use whatever technology you know the best. You will:
- implement the solution faster
- be better able to debug problems you run into
- more easily evaluate libs (or even know about them) that can offload some of the work for you
- have an easier time maintaining and extending the code
Realistically, speed and resource usage are going to be relatively unimportant unless you actually have real performance requirements.
corto: I would use Python.
bigger: He tratado PHP in cli mode, I experienced a lot of pérdidas de memoria, certainly because of bad PHP libs, or PHP libs which have never been though for another thing than fast die in a web-request mode (I'm suscpicious on PDO for example).
En la revista pitón world I've seen recently portion of code from shinken, it's a nice nagios rewrite as python daemons, very clever. See http://www.shinken-monitoring.org/the-global-architecture/ & http://www.shinken-monitoring.org/wiki/official/development-hackingcode . As it's a monitoring tool you can certainly find there some very good ideas for some daemons repeting tasks.
Now, can I make a proposition? Why not using Shinken or Centreon as the scheduler for data fetching tasks? (And maybe soon Centreon with a shinken engine instead of nagios engine, I hope)? This could be useful to detect changes in external data, issue in fetchs, etc.
Then for the tasks that should be done (fetch data, transform data, store data, etc) this is the job of an ETL. One nice open source tool is Talend ETL (Java). There're some scheduling and monitoring tools for Talend but not Open source (sort-of-open-source-where-you-must-pay-a-license). But adding an external scheduler like Nagios for tasks should be easy (I hope). You'll need to check that memcached is available as a storage engine for talend ETL or code your plugin.
So, this to say than instead of the language you should maybe think about the tools. Or not, depending on the complexity you can assume, each tool add his own complexity. However if you want to rebuild all from scratch python is fast an efficient.
You should use the same language that the rest of your application is written in. That way you can reuse code and developer skills more easily.
However, as others have noted, PHP is bad for long-running daemons because it handles memory in a way which is liable to leak.
So I would run these tasks in a "cron" job which was periodically (re-) started, but make sure you don't run more copies of the tasks than you intend.
Cron jobs are more robust than daemons.
- A cron job which fails and quits will start again next time it is scheduled
- A cron job which contains memory leaks will release its memory when it ends its run anyway
- A cron job which has its software upated (libraries etc) automatically picks up the new versions on the subsequent run without any special effort.
- "cron" already provides startup/shutdown scripts which your Ops team can use to control it, so you don't need to rewrite these. Your Ops team already know how to operate "cron", and know how to comment out crontab entries if they want to temporarily disable it.