this post was submitted on 21 Jul 2025
589 points (98.7% liked)
Technology
73094 readers
2450 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Batch process turning unstructured free form text data into structured outputs.
As a crappy example imagine if you wanted to download metadata about your albums but they’re all labelled “Various Artists”. You can use an LLM call to read the album description and fix the track artists for the tracks, now you can properly organize your collection.
I’m using the same idea, different domain and a complex set of inputs.
It can be much more cost effective than manually spending days tagging data and writing custom importers.
You can definitely go lighter than LLMs. You can use gensim to do category matching, you can use sentence transformers and nearest neighbours (this is basically what Semantle does), but LLM performed the best on more complex document input.
Thanks.
That's pretty much what google says they use AI for, for structuring.
Thanks for your insight.