Found out that pgs-to-srt can export images, so you can see what it's looking at.
Starting to make sense why it's so bad. Wonder if I can add a preprocessor to do something like this:
The Lemmy community will help you with your tech problems and questions about anything here. Do not be shy, we will try to help you.
If something works or if you find a solution to your problem let us know it will be greatly apreciated.
Rules: instance rules + stay on topic
Partnered communities:
Found out that pgs-to-srt can export images, so you can see what it's looking at.
Starting to make sense why it's so bad. Wonder if I can add a preprocessor to do something like this:
I've never had great results with tesseract if the image has compression so the mixed background sounds like a nightmare. There is probably some JavaScript stream in there but good luck accessing it. BR is hot garbage for a standard.
That's the thing. There isn't a background. The PGS layer is separate which is why it's so surprising the error rate is so high.
OCR 5 from F-droid was really good for me like 2+ years ago, but when I tried it more recently it was garbage. It really stood out to me around 2 years ago because around 5 years ago I tried translating a Chinese datasheet for one of the Atmel uC clone microcontrollers and OCR was not fun then.
Maybe have a look at Huggingface spaces and see if anyone has a better methodology setup as an example. Or look at the history of the models and see if one of the older ones is still available.
I think I spoke too soon when I said the text didn't have a background or was otherwise clean.. SubtitleEdit always shows it on a white background, but looks like the text itself actually has a white border which I'm sure is confusing the OCR. See my other comment for examples.
I'm going to start by seeing if I can clean up the text, and if not, I'll look into huggingface and whatnot. Thanks for the tips.