¿Utiliza IOS Accelerate Framework para el procesamiento de señales 2D en imágenes que no son Power-of-Two?

//EDIT...

I'm editing my question slightly to address the issue of working specifically with non-power-of-two images. I've got a basic structure that works with square grayscale images with sizes like 256x256 or 1024x1024, but can't see how to generalize to arbitrarily sized images. The fft functions seem to want you to include the log2 of the width and height, but then its unclear how to unpack the resulting data, or if the data isn't just getting scrambled. I suppose the obvious thing to do would be to center the npot image within a larger, all black image and then ignore any values in those positions when looking at the data. But wondering if there's a less awkward way to work with npot data.

//...END EDIT

I'm having a bit of trouble with the Accelerate Framework documentation. I would normally use FFTW3, but I'm having trouble getting that to compile on an actual IOS device (see this pregunta). Can anybody point me to a super simple implementation using Accelerate that does something like the following:

1) Turns image data into an appropriate data structure that can be passed to Accelerate's FFT methods.
In FFTW3, at its simplest, using a grayscale image, this involves placing the unsigned bytes into a "fftw_complex" array, which is simply a struct of two floats, one holding the real value and the other the imaginary (and where the imaginary is initialized to zero for each pixel).

2) Takes this data structure and performs an FFT on it.

3) Prints out the magnitude and phase.

4) Performs an IFFT on it.

5) Recreates the original image from the data resulting from the IFFT.

Although this is a very basic example, I am having trouble using the documentation from Apple's site. The SO answer by Pi here is very helpful, but I am still somewhat confused about how to use Accelerate to do this basic functionality using a grayscale (or color) 2D image.

Anyhow, any pointers or especially some simple working code that processes a 2D image would be extremely helpful!

\\\ EDIT \\\

Okay, after taking some time to dive into the documentation and some very helpful code on SO as well as on pkmital's github repo, I've got some working code that I thought I'd post since 1) it took me a while to figure it out and 2) since I have a couple of remaining questions...

Initialize FFT "plan". Assuming a square power-of-two image:

#include <Accelerate/Accelerate.h>
...
UInt32 N = log2(length*length);
UInt32 log2nr = N / 2; 
UInt32 log2nc = N / 2;
UInt32 numElements = 1 << ( log2nr + log2nc );
float SCALE = 1.0/numElements;
SInt32 rowStride = 1; 
SInt32 columnStride = 0;
FFTSetup setup = create_fftsetup(MAX(log2nr, log2nc), FFT_RADIX2);

Pass in a byte array for a square power-of-two grayscale image and turn it into a COMPLEX_SPLIT:

COMPLEX_SPLIT in_fft;
in_fft.realp = ( float* ) malloc ( numElements * sizeof ( float ) );
in_fft.imagp = ( float* ) malloc ( numElements * sizeof ( float ) );

for ( UInt32 i = 0; i < numElements; i++ ) {
    if (i < t->width * t->height) {
      in_fft.realp[i] = t->data[i] / 255.0;
      in_fft.imagp[i] = 0.0;
    }
}

Run the FFT on the transformed image data, then grab the magnitude and phase:

COMPLEX_SPLIT out_fft;
out_fft.realp = ( float* ) malloc ( numElements * sizeof ( float ) );
out_fft.imagp = ( float* ) malloc ( numElements * sizeof ( float ) );

fft2d_zop ( setup, &in_fft, rowStride, columnStride, &out_fft, rowStride, columnStride, log2nc, log2nr, FFT_FORWARD );

magnitude = (float *) malloc(numElements * sizeof(float));
phase = (float *) malloc(numElements * sizeof(float));

for (int i = 0; i < numElements; i++) {
   magnitude[i] = sqrt(out_fft.realp[i] * out_fft.realp[i] + out_fft.imagp[i] * out_fft.imagp[i]) ;
   phase[i] = atan2(out_fft.imagp[i],out_fft.realp[i]);
}

Now you can run an IFFT on the out_fft data to get the original image...

COMPLEX_SPLIT out_ifft;
out_ifft.realp = ( float* ) malloc ( numElements * sizeof ( float ) );
out_ifft.imagp = ( float* ) malloc ( numElements * sizeof ( float ) );
fft2d_zop (setup, &out_fft, rowStride, columnStride, &out_ifft, rowStride, columnStride, log2nc, log2nr, FFT_INVERSE);   

vsmul( out_ifft.realp, 1, SCALE, out_ifft.realp, 1, numElements );
vsmul( out_ifft.imagp, 1, SCALE, out_ifft.imagp, 1, numElements );

Or you can run an IFFT on the magnitude to get an autocorrelation...

COMPLEX_SPLIT in_ifft;
in_ifft.realp = ( float* ) malloc ( numElements * sizeof ( float ) );
in_ifft.imagp = ( float* ) malloc ( numElements * sizeof ( float ) );
for (int i = 0; i < numElements; i++) {
  in_ifft.realp[i] = (magnitude[i]);
  in_ifft.imagp[i] = 0.0;
}

fft2d_zop ( setup, &in_fft, rowStride, columnStride, &out_ifft, rowStride, columnStride, log2nc, log2nr, FFT_INVERSE );      

vsmul( out_ifft.realp, 1, SCALE, out_ifft.realp, 1, numElements );
vsmul( out_ifft.imagp, 1, SCALE, out_ifft.imagp, 1, numElements );

Finally, you can put the ifft results back into an image array:

for ( UInt32 i = 0; i < numElements; i++ ) {
  t->data[i] = (int) (out_ifft.realp[i] * 255.0);
}     

I haven't figured out how to use the Accelerate framework to handle non-power-of-two images. If I allocate enough memory in the setup, then I can do an FFT, followed by an IFFT to get my original image. But if try to do an autocorrelation (with the magnitude of the FFT), then my image gets wonky results. I'm not sure of the best way to pad the image appropriately, so hopefully someone has an idea of how to do this. (Or share a working version of the vDSP_conv method!)

preguntado el 22 de mayo de 12 a las 20:05

It looks like you're just doing autocorrelation here ? I thought Accelerate/vDSP already had autocorrelation functions so that you didn't have to roll your own with FFT/IFFT etc ? -

Hi, the vDSP_acor doesn't seem to actually exist in the Accelerate library. vDSP_conv exists, but it's giving me strange results... perhaps I'm using it incorrectly for doing an image autocorrelation. If you (or anyone) has a working example for using vDSP_conv for autocorrelation it would be great to see it. Part of the issue is just that it is confusing as to what data in expects and outputs, etc. -

1 Respuestas

I would say that in order to perform work on arbitrary image sizes, all you have to do is size your input value array appropriately to the next power of 2.

The hard part is where to put your original image data and what to fill with. What you are really trying to do to the image or data mine from the image is crucial.

In the linked PDF below, pay particular attention to the paragraph just above 12.4.2 http://www.mathcs.org/java/programs/FFT/FFTInfo/c12-4.pdf

While the above speaks about the manipulation along 2 axes, we could potentialy perform a similar idea prior to the second dimension, and following onto the second dimension. If Im correct, then this example could apply (and this is by no means an exact algorithm yet):

say we have an image that is 900 by 900: first we could split the image into vertical strips of 512, 256, 128, and 4. We would then process 4 1D FFTs for each row, one for the first 512 pixels, the next for the following 256 pixels, the next for the following 128, then the last for the remaining 4. Since the output of the FFT is essentially popularity of frequency, then these could simply be added (from the frequency ONLY perspective, not the angular offset). We could then push this same techniquie toward the 2nd dimension. At this point we would have taken into consideration every input pixel without actually having to pad.

This is really just food for thought, I have not tried this myself, and indeed should research this myself. If you are truly doing this kind of work right now, you may have more time than I at this point though.

contestado el 30 de mayo de 12 a las 22:05

This isn't like processing a 20 second audio signal into 5 second chunks. It's more like processing an audio signal by splitting apart 0-10khz and 10khz-20khz. You're getting different bands from each, so you'd be adding 2khz with 12khz, or something along those lines... - glenn maynard

This is more of a question than anything, because I'm still developing an intuition for all of this, but doesn't the separability of FFT allow for doing strips of the image 512 at a time, etc? I watched a vid of someone doing a 2D FFT by doing nested loops of 1D FFTs 1 row of pixels at a time. - Dios de las galletas

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.