java - Java8, Tess4j : Optimize image for OCR with tesseract -
i working on tesseract , have ocr functionality working already. wanted optimize image ocr results better. making image monochrome , scaling double size. after having issues smaller fonts.
i tried looking up, , here 1 of top answers can find. unfortunately, works bitmap , cannot find native class in java works bitmap. there answer java code, again uses bitmap , doesn't specify package it.
anything more nice. thank you.
code :
private string testocr(string filelocation, int attachid) { try { file imagefile = new file(filelocation); bufferedimage img = imageio.read(imagefile); string identifier = string.valueof(new biginteger(130, random).tostring(32)); string blackandwhiteimage = previewpath + identifier + ".png"; file outputfile = new file(blackandwhiteimage); bufferedimage bufferedimage = bitmapimageutil.converttograyscale(img,new dimension(img.getwidth(),img.getheight())); bufferedimage = scalr.resize(bufferedimage,img.getwidth()*2,img.getheight()*2); imageio.write(bufferedimage,"png",outputfile); itesseract instance = tesseract.getinstance(); // point 1 folder above tessdata directory, must contain training data instance.setdatapath("/usr/share/tesseract-ocr/"); // iso 693-3 standard instance.setlanguage("deu"); string result = instance.doocr(outputfile); // result processing regex. }
thank you.
Comments
Post a Comment