The Donut mannequin in Python is a mannequin you need to use to extract textual content from a given picture. This could be helpful in numerous situations, like scanning receipts, for instance.
You can simply obtain the Donut mannequin from GitHub. But as is widespread with AI fashions, you need to fine-tune the mannequin on your particular wants.
I wrote this tutorial as a result of I didn’t discover any assets exhibiting me precisely tips on how to fine-tune the Donut mannequin with my dataset. So I needed to study this from different tutorials (which I’ll share all through this information) and determine points myself.
These points have been particularly prevalent as I didn’t have a GPU on my native pc So to simplify the method for others, I made this tutorial.
Here’s what we’ll cowl:
- How to discover a dataset to fine-tune with
- Fine-tuning with Google Colab
- How to alter parameters
- Fine-tuning regionally
How to Find a Dataset to Fine-tune with
Finding a dataset on-line
To fine-tune the mannequin, we want a dataset we’ll fine-tune with. If you desire a easy resolution, you could find a ready dataset in this folder on Google Drive.
You ought to then copy this dataset over to your individual Google Drive. Note that this was taken from this tutorial below the “Downloading and parsing SROIE” headline. The tutorial is a superb learn which impressed this text, as I wished to create a extra in-depth tutorial for fine-tuning the Donut mannequin in Google Colab. So in order for you a extra in-depth take a look at producing the dataset, I like to recommend studying the tutorial above.
The dataset linked above might not essentially be on your particular function. If you need to fine-tune a mannequin to your particular wants, you both must discover a becoming dataset on-line, or create a dataset your self.
Annotating your individual dataset
This is an alternative choice if you cannot or do not need to discover a dataset on-line (so in case you did that, you’ll be able to ignore this subsection).
Annotating your individual dataset is a surefire strategy to create a dataset that completely matches your wants.
There are many annotating instruments on-line, however a free one I like to recommend is the Sparrow UI knowledge annotation instrument. Here you’ll be able to add your picture, put bounding containers on the picture, and label every bounding field. You can then extract the labeled knowledge in JSON format, and use it following the remainder of the tutorial.
Make positive your dataset is in the identical format because the dataset I offered earlier. For extra particulars on annotating knowledge with the Sparrow UI, you’ll be able to take a look at my article on utilizing the Donut mannequin for self-annotated knowledge. Note that this text assumes you’re already capable of finetune the Donut mannequin (which you’ll study on this article).
Fine-Tuning with Google Colab
To make the fine-tuning course of so simple as doable, I offered a Google Colab file you need to use right here. (Some code is taken from this GitHub web page).
Note that package deal variations should be precisely as offered within the Drive, as unsuitable package deal variations have been the basis of loads of the issues I confronted fine-tuning the Donut mannequin myself.
Before fine-tuning utilizing the Google Colab file, there are 2 issues you might want to do:
Upload knowledge to your Google Drive.
Upload the dataset I offered earlier to your Google Drive below a guardian folder referred to as readyFinetuneData (see the file construction within the picture beneath).
Make positive so as to add the guardian folder within the root folder on your Google Drive. Also, obtain this config file and add it to the basis folder of your Google Drive.
Link your Google Drive to your Google Colab.
When you run the cell which mounts the Google Drive, you may get a immediate, by which case you’ll be able to simply settle for it and ignore the remainder of this paragraph.
If you don’t get a immediate, press the recordsdata icon (purple within the picture beneath), and the Mount Drive Icon (blue within the picture beneath). Then you’re going to get a code snippet that you would be able to run, and now your Google Drive is related.
Note that in case you have not related Google Colab to Google Drive earlier than, it’s a must to log into your Google Drive after urgent the Drive icon, and provides permission for Colab to entry Drive (prompts for this could seem routinely while you attempt to hyperlink the Drive)
Finally, restart your runtime. After altering recordsdata on Google Colab, you at all times must restart your runtime to see the newest updates.
How to Change Parameters
Great! Now you’ll be able to run the cells within the pocket book, and you need to obtain a fine-tuned mannequin. Remember you too can change the Config parameters to, for instance, practice for longer, use extra employees, and so forth.
Note that I’m working with the Donut mannequin fine-tuned on the CORD dataset, as I would like to have the ability to learn receipts. You can even discover different Donut fashions right here, with the opposite choices being doc parsing, doc classification, or doc visible query answering (DocVQA).
Fine-tuning Locally
Fine-tuning can be run regionally, which can be principally related for you in case you have a GPU, as CPU coaching will take a very long time.
To run regionally it’s a must to:
- First, clone this GitHub repository
- Add the ready fine-tuning dataset to the basis folder.
- If you need to save the fine-tuned mannequin, add the road beneath to coach.py line 164, proper beneath coach.match(…)
#...
coach.save_checkpoint(f"{Path(config.result_path)}/{config.exp_name}/{config.exp_version}/model_checkpoint.ckpt")
#...
4. You then must remark out GPU processes within the PyTorch Lightning Trainer, and add the road: accelerator=”cpu”:
#practice.py file
#...
coach = pl.Trainer(
#Comment out the strains above
# num_nodes=config.get("num_nodes", 1),
# units=torch.cuda.device_count(),
# technique="dp",
# accelerator="gpu",
accelerator="cpu", #TODO add this line
plugins=custom_ckpt,
max_epochs=config.max_epochs,
max_steps=config.max_steps,
val_check_interval=config.val_check_interval,
check_val_every_n_epoch=config.check_val_every_n_epoch,
gradient_clip_val=config.gradient_clip_val,
precision=16,
num_sanity_val_steps=0,
logger=logger,
callbacks=[lr_callback, checkpoint_callback, bar],
)
#...
5. Make positive the max_epochs parameter in your Config file is ready to -1 (if not you’re going to get a division by 0 error). You can resolve coaching time by setting the parameter max_steps.
6. You can then run fine-tuning can then be run with the next command within the terminal:
python practice.py --config config/train_cord.yaml
Where train_cord.yaml is the Configuration file you need to use.
Running on CPU
If you’re working on CPU in any case, you’ll encounter some issues until you make some modifications:
- donut/practice.py, change the accelerator parameter to “cpu” (from “gpu”), and take away the parameters: num_nodes, units, and technique).
- Then in your Config file (for instance train_cord.yaml), set max_epochs to -1, after which specify the parameter max_steps. This is as a result of you’ll encounter a division by 0 error in case you have max_epoch bigger than 0
After these modifications, working on a CPU ought to work as nicely.
Conclusion
In this text, I’ve proven you tips on how to simply fine-tune the Donut mannequin utilizing your individual knowledge, one thing which can hopefully lead to improved accuracy on your fine-tuned Donut mannequin.
The applicabilities of the Donut mannequin are many, and this is only one manner to make use of it, which I hope is beneficial.
If you have an interest and need to study extra about related matters, you could find me on: