The Filenames
Over the past 20 years, we obviously haven’t taken all our pictures with just one device. Several cameras and smartphones are the source of the image files. Even though the iPhone 15 Pro Max takes great pictures, we also still like to bring along our Sony NEX-5R, the predecessor to the well-known Sony Alpha 6400. Unsurprisingly, the filenames are not only hardly descriptive, but also inconsistent. From the classic DSCxxxx.jpg to WhatsAppPhoto-xxx.jpg, pretty much everything is there.
So, sorting by filename is not an option. Browsing through pictures in an image viewer? Most of them go by filename, not the time the photo was taken. Also, the file itself doesn’t show when or in what context the picture was taken. Over the years, I’ve occasionally looked for tools to help me bring some order to the filenames. But I never found one that worked without issues.
But now we have Large Language Models, and thanks to Telekom, I have a free annual subscription to Perplexity.ai. After a brief description, a few tests, and some adjustments, I had a Python script ready that digs through the folders and renames the image files uniformly according to the scheme (Sequence Number) (Capture Date)-(Capture Time) (Event).(File Extension). Since the folders were already named in the format Year-Month Event, I could simply take the event from them. For the correct sequence, the images are first sorted by capture date and then numbered.
This way, the pictures in each folder are clearly named and, thanks to the sequence number at the beginning, also clearly sorted and identifiable. A four- or five-digit number is easier to find than similar times on the same day. And it prevents problems with duplicate names because the date and time are identical.
The Folders
The folder names are actually mostly fine, but some folders defy logic. On one hand, there are collection folders for older pictures that were digitized from albums later on. There, you don’t have a single month, but a time span. Others are collection folders where the pictures haven’t been sorted yet. This includes pictures backed up from my smartphone to the NAS, but also found pictures that couldn’t be directly assigned or might be duplicates.
Here, it’s really up to me to go through the older collection folders and sort the pictures properly. The backed-up photos from my smartphone, in particular, are still unsorted in their folders. Sorting them is especially annoying because there are also many everyday photos that can’t be assigned to a specific event. These then end up in a monthly “everyday” folder. But you just have to go through the pictures and move them to the right folders to get rid of the annoying ones. I’ll get to it someday…
The Subjects
This is the hardest part. You can actually write a description, a title, and tags into the file properties, called EXIF data, for every picture. But cameras don’t do this automatically. With over 100,000 pictures, I don’t want to do it by hand either.
This would actually be a perfect use case for an AI. But who wants to give their data to a third-party AI? Just as I was typing that last sentence, an idea came to me. You can also run LLMs locally. My Nvidia RTX 3080 should have enough power for this task. I was familiar with Ollama from a project at work, and a quick search on Kagi and two hours of tinkering with Perplexity.ai resulted in a Python script that pushes my graphics card to its limits. Luckily, it’s not summer yet.
Currently, a Python script is going through all the folders, sending the images via Ollama to the local Llama model. It analyzes the image and generates up to 15 English keywords from the information, which are then saved in the EXIF data. I focused on objects and the weather, so I get keywords like: beach, church, cloudy, cat, โฆ
I did a quick estimate, and my calculation shows that my computer will need about six full days to generate the keywords. At a power consumption of 300 watts per hour, that will cost about 15 euros. With the OpenAI API, I paid 0.80 euros for 300 test images. All the pictures would cost me around 270 euros. Alternatively, I could just use the free daily tokens. These are enough for 250 pictures, which means I’d be done in a little over 1.5 years. I think I’d rather invest the 15 euros in electricity costs and not have to give my data away.
Sometimes It Just Helps to Write About It
I actually wanted to write in this article about how poorly organized my photo collection is. And while I was thinking about the idea for the article and also while writing it, I solved two of my three problems.
All that’s left now are the unsorted pictures. I’m already wondering if I can use a Python script or something with AI for that too. But I haven’t figured out how that could work in a meaningful way yet. The boundaries between the individual topics are too blurry, both in terms of content and time. I guess I’ll just have to sit down and work my way through it.
How do you sort your pictures and keep track of them?