verifique si la URL es un enlace de descarga usando webclient c #

I am reading from the history database, and for every URL read, I am downloading it and storing the data into a string. I want to be able to determine if the link is a download link, i.e. .exe or .zip for e.g. I am assuming I need to read the headers to determine this, but I don't know how to do it with WebClient. Any suggestions?

while (sqlite_datareader.Read())
{
    noIndex = false;

    string url = (string)sqlite_datareader["url"];

    try
    {
        if (url.Contains("http") && (!url.Contains(".pdf")) && (!url.Contains(".jpg")) && (!url.Contains("https")) && !isInBlackList(url))
        {

            WebClient client = new WebClient(); 
            client.Headers.Add("user-agent", "Only a test!");


            String htmlCode = client.DownloadString(url);
        }
    }
}

preguntado el 10 de mayo de 11 a las 13:05

4 Respuestas

Instead of loading the complete content behind the link, I would issue a HEAD request.

El método HEAD es idéntico a GET, excepto que el servidor NO DEBE devolver un cuerpo de mensaje en la respuesta. La metainformación contenida en los encabezados HTTP en respuesta a una solicitud HEAD DEBE ser idéntica a la información enviada en respuesta a una solicitud GET. Este método se puede utilizar para obtener metainformación sobre la entidad implícita en la solicitud sin transferir la entidad-cuerpo en sí. Este método se utiliza a menudo para comprobar la validez, accesibilidad y modificación reciente de enlaces de hipertexto.

Quote of http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html

See these questions for C# examples

contestado el 23 de mayo de 17 a las 15:05

You're on the right track; you'll need to examine the ResponseHeaders after a successful request:

var someType = "application/zip";
if (client.ResponseHeaders["Content-Type"].Contains(someType)) {
    // this was a "download link"
}

The tricky part will be in determining what constitutes a download link since there are so many content types possible. For example, how would you decide whether XML data is a download link or not?

contestado el 10 de mayo de 11 a las 17:05

That's true. Perhaps there is a way to check the size of data before download? However, seeing as I don;t have much time, .exe, .zip and .rar will suffice. Thank you - michelle

ok still, I will need to download the string or get the response stream..the reason why i want to filter out .exe etc is so that i won't need to download them. unfortunately not all links contain .exe in their URL and so i will need to see response header :/ - michelle

Podrías intentar usar DownloadStringAsync() instead. Then as soon as you have the headers you can determine what to do with the content and either cancel or allow the download to complete. - ¡Qué asco!

Try to check WebClient's ResponseHeaders collections to validate response file type.

contestado el 10 de mayo de 11 a las 17:05

In case, anyone has the same problem, I have used an attribute in the history places.sqlite database which came in very handy!

Places.sqlite contains a table called moz_historyvisits which contains a column visit_type. According to [1], a visit_type of 7 is a enlace de descarga. Therefore, reading this value will determine if it is a download link without reading the response header or even sending out a head method.

[1] http://www.firefoxforensics.com/research/moz_historyvisits.shtml

contestado el 11 de mayo de 11 a las 15:05

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.