¿Qué algoritmo de clasificación se ajusta a esta condición 'similar a una corriente'?

I have a buffer receiving data, which means the data are like 'stream' and have latency in 'IO'. The way I am doing now is when the buffer is full, using qsort to sort the buffer and write the result to disk. but there is obvious latency when doing qsort, so I am looking for some other sorting algorithms that may start sorting while the data is being added to the buffer, in order to reduce time consumed overall.

don't know if I have made myself clear and leave any comments if needed, thanks

preguntado el 09 de marzo de 12 a las 13:03

Insertion sort. Really ;-) However, an O(n lg n) sort can sort large amounts of data quite quickly... and not necessarily quicker if it's "mostly sorted" (quicksort can actually be very degenerate in this case!)... so it might be worthwhile to setup a quick performance analysis. -

3 Respuestas

Heap sort keeps the data permanently in a partially sorted condition and so is comparable to Insertion sort. But it is substantially quicker and has a worst case of O(n log n) compared with O(n2) for Insertion Sort.

How is this going to work? Presumably at some point you have to stop reading from the stream, store what you have sorted, and start reading a new set of data?

respondido 09 mar '12, 13:03

+1 for heapsort, you don't need it to be fully sorted for buffering between writes - ciber-monje

yes in my case I have to stop reading from the stream and sort the buffer and write the result to disk, and then start reading again and repeat until the stream ends - Mickey brillo

Then heap sort is what you want. Read data from the stream into the heap until you have to stop, and then read from the heap and write to disk until it is empty. Data read from the heap comes out in sorted order. - Borodin

why not use use a binary search tree? polling from the heap is O(logn) right? - Haiyang

I think merge-sort or tree sort can be of great help . Look why on wikipedia.

  • When you can cut the huge input in reasonable large blocks, merge-sort is more appropriate.
  • When you insert small pieces at a time, tree-sort is more appropriate.

You want to implement an online sorting algorithm, ie an algorithm which runs while receiving the data in a streamlined fashion. Search for algoritmos en línea over the web and you may find other nice algorithms.

In your case I would use tree sort. It doesn't have a better complexity than quicksort (both are O(nlog n) most of the time and O(n²) in few bad cases). But it amortizes the cost over each input. Which means the delay you have to wait after the last data is added is not of order O(nlog n), pero O(log n)

respondido 09 mar '12, 13:03

You can try to use my Link Array structure. It should be ok for sequential adding of random data while keeping it sorted (look at the numbers in the table). This is a variation of Lista de omisión approach but with easier implementation and logic (although the performance of Skip list should be better)

respondido 10 mar '12, 12:03

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.