frecuentes NoHttpResponseException con AmazonS3.getObject(request).getObjectContent()
Frecuentes
Visto 12,405 equipos
12
I have a helper routine that tries to do threaded downloading from S3. Very often (about 1% of the requests) I get a log message about a NoHttpResponseException
which after a while causes a SocketTimeoutException
when reading from the S3ObjectInputStream
.
Am I doing something wrong, or is it just my router/internet? Or is this to be expected from S3? I don't notice problems elsewhere.
public void
fastRead(final String key, Path path) throws StorageException
{
final int pieceSize = 1<<20;
final int threadCount = 8;
try (FileChannel channel = (FileChannel) Files.newByteChannel( path, WRITE, CREATE, TRUNCATE_EXISTING ))
{
final long size = s3.getObjectMetadata(bucket, key).getContentLength();
final long pieceCount = (size - 1) / pieceSize + 1;
ThreadPool pool = new ThreadPool (threadCount);
final AtomicInteger progress = new AtomicInteger();
for(int i = 0; i < size; i += pieceSize)
{
final int start = i;
final long end = Math.min(i + pieceSize, size);
pool.submit(() ->
{
boolean retry;
do
{
retry = false;
try
{
GetObjectRequest request = new GetObjectRequest(bucket, key);
request.setRange(start, end - 1);
S3Object piece = s3.getObject(request);
ByteBuffer buffer = ByteBuffer.allocate ((int)(end - start));
try(InputStream stream = piece.getObjectContent())
{
IOUtils.readFully( stream, buffer.array() );
}
channel.write( buffer, start );
double percent = (double) progress.incrementAndGet() / pieceCount * 100.0;
System.err.printf("%.1f%%\n", percent);
}
catch(java.net.SocketTimeoutException | java.net.SocketException e)
{
System.err.println("Read timed out. Retrying...");
retry = true;
}
}
while (retry);
});
}
pool.<IOException>await();
}
catch(AmazonClientException | IOException | InterruptedException e)
{
throw new StorageException (e);
}
}
2014-05-28 08:49:58 INFO com.amazonaws.http.AmazonHttpClient executeHelper Unable to execute HTTP request: The target server failed to respond
org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:713)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:518)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:385)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:233)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3569)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1130)
at com.syncwords.files.S3Storage.lambda$fastRead$0(S3Storage.java:123)
at com.syncwords.files.S3Storage$$Lambda$3/1397088232.run(Unknown Source)
at net.almson.util.ThreadPool.lambda$submit$8(ThreadPool.java:61)
at net.almson.util.ThreadPool$$Lambda$4/1980698753.call(Unknown Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
2 Respuestas
18
ACTUALIZAR: There have been updates to the AWS SDK in response to the issues I created on GitHub. I'm not sure how the situation changed. The second part of this answer (criticizing getObject
) is likely (hopefully?) wrong.
S3 is designed to fail, and it fails often.
Fortunately, the AWS SDK for Java has built-in facilities for retrying requests. Unfortunately, they do not cover the case of SocketExceptions while downloading S3 objects (ellos do work when uploading and doing other operations). So, code similar to that in the question is necessary (see below).
When the mechanism works as desired, you will still see messages in your log. You may choose to hide them by filtering INFO
log events from com.amazonaws.http.AmazonHttpClient
. (AWS SDK uses Apache Commons Logging.)
Depending on your network connection and the health of Amazon's servers, the retry mechanism may fail. As pointed out by lvlv, the way to configure relevant parameters is through ClientConfiguration. The parameter I suggest changing is the number of retries, which is by default 3
. Other things you may try is increasing or decreasing connection and socket timeouts (default 50s, which is not only long enough, it is probably too long given the fact that you're going to timeout often no matter what) and using TCP KeepAlive (default off).
ClientConfiguration cc = new ClientConfiguration()
.withMaxErrorRetry (10)
.withConnectionTimeout (10_000)
.withSocketTimeout (10_000)
.withTcpKeepAlive (true);
AmazonS3 s3Client = new AmazonS3Client (credentials, cc);
The retry mechanism can even be overriden by setting a RetryPolicy
(again, in the ClientConfiguration
). Its most interesting element is the RetryCondition
, which by default:
checks for various conditions in the following order:
- Retry on AmazonClientException exceptions caused by IOException;
- Retry on AmazonServiceException exceptions that are either 500 internal server errors, 503 service unavailable errors, service throttling errors or clock skew errors.
See SDKDefaultRetryCondition javadoc e fuente
The Half-assed Retry Facilities Hidden Elsewhere in the SDK
What the built-in mechanism (which is used across the whole AWS SDK) does no handle is reading S3 object data.
AmazonS3Client uses its own retry mechanism if you call AmazonS3.getObject (GetObjectRequest getObjectRequest, File destinationFile)
. The mechanism is inside ServiceUtils.retryableDownloadS3ObjectToFile
(fuente), which uses a sub-optimal hard-wired retry behavior (it will only retry once, and never on a SocketException!). All of the code in ServiceUtils
seems poorly engineered ( ).
I use code similar to:
public void
read(String key, Path path) throws StorageException
{
GetObjectRequest request = new GetObjectRequest (bucket, key);
for (int retries = 5; retries > 0; retries--)
try (S3Object s3Object = s3.getObject (request))
{
if (s3Object == null)
return; // occurs if we set GetObjectRequest constraints that aren't satisfied
try (OutputStream outputStream = Files.newOutputStream (path, WRITE, CREATE, TRUNCATE_EXISTING))
{
byte[] buffer = new byte [16_384];
int bytesRead;
while ((bytesRead = s3Object.getObjectContent().read (buffer)) > -1) {
outputStream.write (buffer, 0, bytesRead);
}
}
catch (SocketException | SocketTimeoutException e)
{
// We retry exceptions that happen during the actual download
// Errors that happen earlier are retried by AmazonHttpClient
try { Thread.sleep (1000); } catch (InterruptedException i) { throw new StorageException (i); }
log.log (Level.INFO, "Retrying...", e);
continue;
}
catch (IOException e)
{
// There must have been a filesystem problem
// We call `abort` to save bandwidth
s3Object.getObjectContent().abort();
throw new StorageException (e);
}
return; // Success
}
catch (AmazonClientException | IOException e)
{
// Either we couldn't connect to S3
// or AmazonHttpClient ran out of retries
// or s3Object.close() threw an exception
throw new StorageException (e);
}
throw new StorageException ("Ran out of retries.");
}
Respondido el 03 de junio de 16 a las 16:06
6
I previously had similar problems. I found every time after you finish one S3Object, you need to close() it to release some resource back to the pool, according to the official example from AWS S3:
AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());
S3Object object = s3Client.getObject(
new GetObjectRequest(bucketName, key));
InputStream objectData = object.getObjectContent();
// Process the objectData stream.
objectData.close();
Thanks for adding the link. BTW, I guess increasing the max connection, retry and timeout of ClientConfiguration (by default the max connection is 50) may also help solving the problem, like this:
AmazonS3 s3Client = new AmazonS3Cient(aws_credential,
new ClientConfiguration().withMaxConnections(100)
.withConnectionTimeout(120 * 1000)
.withMaxErrorRetry(15))
Respondido el 02 de Septiembre de 14 a las 22:09
It looks like closing the S3Object
is the same as closing the stream returned by getObjectContent()
. Ajuste ClientConfiguration
is an excellent idea. Often, though, I get other errors like "cannot find host s3.amazonaws.com," even when running on EC2. O.o - Alejandro Dubinsky
No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas java amazon-web-services amazon-s3 or haz tu propia pregunta.
If you're not happy with the AWS SDK's retry mechanisms, check out Recurrente. It should work good for this use case. - Jonathan