Minimizar la distancia a una cuadrícula ponderada

Lets suppose you have a 1000x1000 grid of positive integer weights W.

We want to find the cell that minimizes the average weighted distance.to each cell.

The brute force way to do this would be to loop over each candidate cell and calculate the distance:

int best_x, best_y, best_dist;

for x0 = 1:1000,
    for y0 = 1:1000,

        int total_dist = 0;

        for x1 = 1:1000,
            for y1 = 1:1000,
                total_dist += W[x1,y1] * sqrt((x0-x1)^2 + (y0-y1)^2);

        if (total_dist < best_dist)
            best_x = x0;
            best_y = y0;
            best_dist = total_dist;

This takes ~10^12 operations, which is too long.

Is there a way to do this in or near ~10^8 or so operations?

preguntado el 29 de junio de 12 a las 19:06

¿Qué quieres decir exactamente con distance y average weighted distance? Your brute force seems to use the euclidean distance, but you're also multiplying it by w[x1,y1] for some reason. Also, you seem to be finding the shortest possible distance from one point to all the rest - is this what you mean? -

^ I meant smallest possible sum of distances from one point to all the rest, sorry. Simulated annealing might work for you, but I'm not clear on what exactly you want so I can't detail, at least yet. -

A simple optimization is after each total_dist += ... check if it's larger than the best_dist, if yes do a break from the 2 fors -

@IVlad: As the total of W remains constant (the denominator), minimizing the average weighted distance is the same as minimizing the total.of W[i,j]*dist. I need an exact solution, sim annealing is approximate. -

Please see if the implementation I posted below helps explain the image filtering method. -

1 Respuestas

Teoría

This is possible using Filters in O(n m log nm ) time where n, m are the grid dimensions.

You need to define a filter of size 2n + 1 x 2m + 1, and you need to (centered) embed your original weight grid in a grid of zeros of size 3n x 3m. The filter needs to be the distance weighting from the origin at (n,m):

 F(i,j) = sqrt((n-i)^2 + (m-j)^2)

Asegúrate de que W denote the original weight grid (centered) embedded in a grid of zeros of size 3n x 3m.

Then the filtered (correlación cruzada) result

 R = F o W

Te regalaré total_dist grid, simply take the min R (ignoring the extra embedded zeros you put into W) to find your best x0, y0 posiciones.

Image (i.e. Grid) filtering is very standard, and can be done in all sorts of different existing software such as matlab, with the filtrar mando.

I should note, though I explicitly made use of cross-correlation above, you would get the same result with circunvolución only because your filter F is symmetric. In general, image filter is cross-correlation, not convolution, though the two operations are very analogous.

The reason for the O(nm log nm ) runtime is because image filtering can be done using 2D FFT's.

Implementación

Here are both implementations in Matlab, final result is the same for both methods and they are benchmarked in a very simple way:

m=100;
n=100;
W0=abs(randn(m,n))+.001;

tic;

%The following padding is not necessary in the matlab code because
%matlab implements it in the imfilter function, from the imfilter
%documentation:
%  - Boundary options
% 
%        X            Input array values outside the bounds of the array
%                     are implicitly assumed to have the value X.  When no
%                     boundary option is specified, imfilter uses X = 0.

%W=padarray(W0,[m n]);

W=W0;
F=zeros(2*m+1,2*n+1);

for i=1:size(F,1)
    for j=1:size(F,2)
        %This is matlab where indices start from 1, hence the need
        %for m-1 and n-1 in the equations
        F(i,j)=sqrt((i-m-1)^2 + (j-n-1)^2);
    end
end
R=imfilter(W,F);
[mr mc] = ind2sub(size(R),find(R == min(R(:))));
[mr, mc]
toc;

tic;
T=zeros([m n]);
best_x=-1;
best_y=-1;
best_val=inf;
for y0=1:m
    for x0=1:n

        total_dist = 0;

        for y1=1:m
            for x1=1:n
                total_dist = total_dist + W0(y1,x1) * sqrt((x0-x1)^2 + (y0-y1)^2);
            end
        end

        T(y0,x0) = total_dist;
        if ( total_dist < best_val ) 
            best_x = x0;
            best_y = y0;
            best_val = total_dist;
        end

    end
end
[best_y best_x]
toc;

diff=abs(T-R);
max_diff=max(diff(:));
fprintf('The max difference between the two computations: %g\n', max_diff);

Performance

For an 800x800 grid, on my PC which is certainly not the fastest, the FFT method evaluates in just over 700 seconds. The brute force method doesn't complete after several hours and I have to kill it.

In terms of further performance gains, you can attain them by moving to a hardware platform like GPUs. For example, using CUDA's FFT library you can compute 2D FFT's in a fraction of the time it takes on a CPU. The key point is, that the FFT method will scale as you throw more hardware to do the computation, while the brute force method will scale much worse.

Observaciones

While implementing this, I have observed that almost every time, the best_x,bext_y values are one of floor(n/2)+-1. This means that most likely the distance term dominates the entire computation, therefore, you could get away with computing the value of total_dist for only 4 values, making this algorithm trivial!

Respondido 03 Jul 12, 19:07

Nice! Although, wouldn't it be O(n m log(n m)) instead, from the 2D FFT used to do the convolutions? - tormenta venidera

In 2D, you need to compute an FFT along each dimension hence the n log(n) y m log(m) terms multiplied together (see stackoverflow.com/questions/6514861/…). If m = n then it becomes O( n^2 log(n)^2) = O ( 2 n^2 log(n) ) = O ( n^2 log(n) ) - lperro

What is the operator here: R = F o W ? Is this matrix multiplication? I have not heard of filtering in this context. Can you provide a wikipedia reference to the technique? - Andrés Tomazos

@ldog: The FFT's along each dimension can be performed independently along each row and column. So, one set will take O(n m log m), and the other set will take O(m n log n), para un total de O(n m (log n + log m)) = O(n m log(n m)). - tormenta venidera

@AndrewTomazos-Fathomling -- it is circunvolución, and it is more commonly written with an asterisk: [F * W](x) = integral[y](F(x-y) W(y)) - tormenta venidera

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.