Subir archivos a Amazon Cloudfront: ¿es posible?

I've been reading up about pull and push CDNs. I've been using Cloudfront as a pull CDN for resized images:

  • Receive image from client
  • Put image in S3

later on, when a client makes a request to cloudfront for a URL, Cloudfront does not have the image, hence it has to forward it to my server, which:

  • Recibir solicitud
  • Pull image from S3
  • Cambiar el tamaño de la imagen
  • Push image back to Cloudfront

However, this takes a few seconds, which is a really annoying wait when you first upload your beautiful image and want to see it. The delay appears to be mostly the download/reuploading time, rather than the resizing, which is pretty fast.

Is it possible to pro-actively push the resized image to Cloudfront and attach it to a URL, such that future requests can immediately get the prepared image? Ideally I would like to

  • Receive image from client
  • Put image in S3
  • Resize image for common sizes
  • Pre-emptively push these sizes to cloudfront

This avoids the whole download/reupload cycle, making the common sizes really fast, but the less-common sizes can still be accessed (albeit with a delay the first time). However, to do this I'd need to push the images up to Cloudfront. This:

seems to suggest it can be done, but everything else i've seen makes no mention of it. My question is: is it possible? Or are there any other solutions to this problem that I am missing?

preguntado el 02 de mayo de 12 a las 19:05

3 Respuestas

We have tried to similar things with different CDN providers, and for CloudFront I don't think there is any existing way for you to push (what we call pre-feeding) your specific contents to nodes/edges if the cloudfront distribution is using your custom origin.

One way I can think of, also as mentioned by @Xint0 is set up another S3 bucket to specifically hosting those files you would like to push (in your case those resized images). Basically you will have two cloudFront distributions one to pull those files rarely accessed and another to push for those files accessed frequently and also those images you expect to be resized. This sounds a little bit complex but I believe that's the tradeoff you have to make.

Another point I can recommend you to look at is EdgeCast which is another CDN provider and they do provide function called load_to_edge (which I spent quite a lot of time last month to integrate this with our service, that's why I remember it clearly) which does exactly what you expect. They also support custom origin pull, so that maybe you can take a trial there.

contestado el 08 de mayo de 12 a las 02:05

Pity; I'm using cloudfront as a cache of sorts, which greatly simplified storing multiple versions of each image on s3. Maintaining a separate dist for common files would render that advantage moot. I suppose my next step would be to try to pre-render-and-cache the common image sizes on the origin server, so the first cloudfront hit would only have to pay the upload time and not the download-resize-upload time - Li Haoyi

A further comment to my previous answer that CloudFront has recently starts to support multiple origin for the same distribution. This means my solution for two distributions can be saved to one, and you can config which set of files point to which origin url. - yudong li

Any recommendations on which CDN supports push/pre-feeding? - 1a1a11a

The OP asks for a push CDN solution, but it sounds like he's really just trying to make things faster. I'm venturing that you probably don't really need to implement a CDN push, you just need to optimize your origin server pattern.

So, OP, I'm going to assume you're supporting at most a handful of image sizes--let's say 128x128, 256x256 and 512x512. It also sounds like you have your original versions of these images in S3.

This is what currently happens on a cache miss:

  1. CDN receives request for a 128x128 version of an image
  2. CDN does not have that image, so it requests it from your origin server
  3. Your origin server receives the request
  4. Your origin server downloads the original image from S3 (presumably a larger image)
  5. Your origin resizes that image and returns it to the CDN
  6. CDN returns that image to user and caches it

Qué deberías estar haciendo en su lugar:

There are a few options here depending on your exact situation.

Here are some things you could fix quickly, with your current setup:

  1. If you have to fetch your original images from S3, you're basically making it so that a cache miss results in every image taking as long to download as the original sized image. If at all possible, you should try to stash those original images somewhere that your origin server can access quickly. There's a million different options here depending on your setup, but fetching them from S3 is about the slowest of all of them. At least you aren't using Glacier ;).
  2. You aren't caching the resized images. That means that every edge node Cloudfront uses is going to request this image, which triggers the whole resizing process. Cloudfront may have hundreds of individual edge node servers, meaning hundreds of missing and resizes per image. Depending on what Cloudfront does for tiered distribution, and how you set your file headers it may not actually be que bad, but it won't be good.
  3. I'm going out on a limb here, but I'm betting you aren't setting custom expiration headers, which means Cloudfront is only caching each of these images for 24 hours. If your images are immutable once uploaded, you'd really benefit from returning expiration headers telling the CDN not to check for a new version for a long, long time.

Here are a couple ideas for potentially better patterns:

  1. When someone uploads a new image, immediately transcode it into all the sizes you support and upload those to S3. Then just point your CDN at that S3 bucket. This assumes you have a manageable number of supported image sizes. However, I would point out that if you support too many image sizes, a CDN may be the wrong solution altogether. Your cache hit rate may be so low that the CDN is really getting in the way. If that's the case, see the next point.
  2. If you are supporting something like continuous resizing (ie, I could request image_57x157.jpg or image_315x715.jpg, etc and the server would return it) then your CDN may actually be doing you a disservice by introducing an extra hop without offloading much from your origin. In that case, I would probably spin up EC2 instances in all the available regions, install your origin server on them, and then swap image URLs to regionally appropriate origins based on client IP (effectively rolling your own CDN).

And if you reeeeeally want to push to Cloudfront:

You probably don't need to, but if you simply must, here are a couple options:

  1. Escribir un guión para use the APIs to fetch your image from a variety of different places around the world. In a sense, you'd be pushing a pull command to all the different edge locations. This isn't guaranteed to populate every edge location, but you could probably get close. Note that I'm not sure how thrilled would be about using it this way, but I don't see anything in there terms of use about it (IANAL).
  2. If you don't want to use a third party or risk irking, just spin up a micro EC2 instance in every region, and use those to fetch the content, same as in #1.

Respondido el 20 de junio de 20 a las 10:06

Just to follow up on this. Here we are in The Future and there are better solutions available for this problem now. You could consider a product focused specifically on this problem (e.g. Cloudinary). Or if you want to continue to manage it yourself you could look into aws Lambda or Route53 aliases to distribute the computation to the edge nodes. - dougw

AFAIK CloudFront uses S3 buckets as the datastore. So, after resizing the images you should be able to save the resized images to the S3 bucket used by CloudFront directly.

contestado el 02 de mayo de 12 a las 19:05

I have it set up to use my origin server as a datastore rather than S3. The point being that now I won't need to worry about the amount of space I'm taking up on S3 or about expiring the images: if someone wants a particular size, they can get it, and in the future they can continue getting it (without re-rendering) for the 24hrs until it expires. Essentially I'm using cloudfront as a cache for resized images (whose original copies are themselves stored on S3) and would like to pre-populate it, rather than doing all this on S3 itself. - Li Haoyi

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.