Hey Linux community,
I’m struggling with a file management issue and hoping you can help. I have a large media collection spread across multiple external hard drives. Often, when I’m looking for a specific file, I can’t remember which drive it’s on.
I’m looking for a file indexing and search tool that meets the following requirements:
- Ability to scan multiple locations
- Option to exclude specific folders or subfolders from both scan and search
- File indexing for quicker searches
- Capability to search indexed files even when the original drive is disconnected
- Real-time updates as files change
Any recommendations for tools that meet most or all of these criteria? It would be a huge help in organizing and finding my media files.
Thanks in advance for any suggestions!
Not indexing, but you can make find faster through parallelization if you have the extension for xargs.
I don’t know what the usecase for this is, but you can do something like create a script for cron that periodically dumps the names of files at a mount point to a path like ~/var/log/something, or use a domain-specific unmount script that dumps the paths before unmounting.
Would require a non-portable script that stores each file’s mtime in an array and compares the old mtime against the new mtime using stat, and then loop. Maybe implement as a daemon.
I think the idea is store the search index in a separate place from the file. For indexing text though, I’ve found that the index is comparable in size to the file itself. It’s not entirely clear to me what OP wants to search. Something like email? Obviously if it’s just metadata for media files (kilobyte text description of a gigabyte video) then the search index can be tiny.
That is what inotify is for.
I realize your overall answer was mostly snark, but the problems mentioned really do take some work to solve. For example, if you want to index email, you want the indexer to understand email headers so it can do the right things with the timestamps and other fields. You can’t just chuck everything into a big generic search engine and press “blend”.
I will mention git-annex which is for sort of a different problem, but it can help you manually track where your offline files are, more or less.
Sorry I have .world blocked so I didn’t see your reply until now (wish I could block instances without blocking instance replies, but whatever)
Yeah I amended my post earlier to recommend logging with a domain specific unmount script, but I don’t know why they want to do this.
Apparently I’m so good at trolling I troll people even when I’m not trying to troll. :<
If inotify works for you, that’s fine. I don’t have any experience with it, maybe I’ll look into it after this, if the usecase ever comes up.
Eh, regex (EREs) is good enough for 99% of usecases honestly. For the 1%, consider using an easier to parse file format.
They have umpty jillion terabytes of video on a shelf full of external HDD’s and they want to know what files are on which drives. In the old days we had racks full of mag tapes and had the same issue. It’s not something new.
For info about inotify, try web search.
For text search, you start needing real indexing once you’re over maybe a GB of text. Before that, you can live with grep or SQL tables or whatever.