Almacenamiento y búsqueda de fechas aproximadas

I'm looking to make a database to hold metadata about a set of pictures and one field I want is date taken. I'd like to be able to store this with coarse or fine-grained accuracy, e.g. for a digital picture the exact time stamp down to the second will be available, but I'd like to be able to tag a picture as just being taken in a particular year, or even a particular decade (decade being the most coarse I'd go), and also to be able to search in this way, requesting fro example all pictures from the 90s, or all pictures from 1992 or all pictures from a particular day etc.

I was wondering if there is a built in way way to do this with SQL, or if there is another way that would be better. I thought about breaking the date up and storing each piece separately, e.g. have a decade field, a year field, a month field etc but this seemed like it might be a slightly clumsy way of doing things.

I'm not fussed about which SQL technology I use as long as it's free. I'm looking at H2 at the moment.

preguntado el 28 de julio de 12 a las 22:07

3 Respuestas

You can do that with just two columns: one for the timestamp, and another one for the level of precision. Then you have to define a precision scale, and some standard to encode lower precision dates on a timestamp.

For example, a precision scale could be:

0   full timestamp
1   day
2   month
3   year
4   decade

With that you could store the dates like this:

timestamp                 |  precision   | notes
2012-07-05 14:00:00       |  0           | full precision
2012-07-05 00:00:00       |  1           | precision up to day
2012-07-01 00:00:00       |  2           | month and year
2012-01-01 00:00:00       |  3           | year
2010-01-01 00:00:00       |  4           | decade

Respondido 28 Jul 12, 23:07

For fuzzy Búsquedas on exact dates you don't need to store each part separately. You can adjust your where clause. For everything from 2012:

SELECT * FROM yourtable
WHERE yourtime >= '2012-01-01' AND yourtime < '2013-01-01'

If you want a specific day:

SELECT * FROM yourtable
WHERE yourtime >= '2012-07-28' AND yourtime < '2012-07-29'

Or a specific hour:

SELECT * FROM yourtable
WHERE yourtime >= '2012-07-28 13:00:00' AND yourtime < '2012-07-28 14:00:00'

To make all these queries efficient you can add an index to your timestamp column.

Regarding how to tienda fuzzy dates, one option is to have a range of dates:

id    taken_from            taken_to               title
1     2011-01-01 00:00:00   2012-01-01 00:00:00    a pic of my car last year

For fuzzy searches on fuzzy dates you could do something like this:

fuzzy date search

In pseudo-SQL:

    (LEAST(@to, taken_to) - GREATEST(@from, taken_from)) /
    (GREATEST(@to - taken_to) - LEAST(@from, taken_from)) AS relevancy
FROM yourtable
WHERE taken_to >= @from AND taken_from < @to

You probably want to order by the relevancy, and you may want to include other factors such as the relevancy returned by a full text search for some search terms.

Respondido 28 Jul 12, 23:07

That approach works for searching if you know exactly when a picture was taken, but is it possible to store a date with a year only, or just a decade only? I might have pictures that I know were taken in the 1920s, but I can't say more accurately than that, and I don't want to store them as being taken on the first of January 1920 if that's not accurate - user1111284

@user1111284: Hmmm... you can store a range of dates during which you think the photo could have been taken e.g. from - to. But what should happen if you have a photo taken "some time in 2012" and your search is for "all photos from June 2012"? Should that photo be included in the search or not? It podría have been taken in June, but most likely it wasn't. Perhaps you could sort by the percentage of overlap between the range on the photo and the range of the search. - marca byers

Yes, I'm not sure if I want to include such items or not. Perhaps have them in a slightly separate list below the definite results, ordered by accuracy. - user1111284

I have used CHAR and VARCHAR in the past, replacing the missing pieces with question marks or dashes. Question marks meant "not known", and dashes meant "not applicable". This proved to be intuitive enough for the users (secretaries and paralegals in complex litigation), flexible enough for the lawyers, and it sorted sensibly.

Esta mean your "dates" are no longer SQL dates. That is, date/time arithmetic and interval compatibility is a lot less robust, when it works at all. (What is "The 1960s plus 20 days?" Is it a longer decade, or a shifted decade?) Whether that matters is application-dependent. I don't think it would be a problem for your application.

Details and caveats are on

Respondido 13 Abr '17, 13:04

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.