Pythia is a modular framework for vision and language multimodal research. Built on top of PyTorch, it features:
- Model Zoo: Reference implementations for state-of-the-art vision and language model including LoRRA (SoTA on VQA and TextVQA), Pythia model (VQA 2018 challenge winner) and BAN.
- Multi-Tasking: Support for multi-tasking which allows training on multiple dataset together.
- Datasets: Includes support for various datasets built-in including VQA, VizWiz, TextVQA and VisualDialog.
- Modules: Provides implementations for many commonly used layers in vision and language domain
- Distributed: Support for distributed training based on DataParallel as well as DistributedDataParallel.
- Unopinionated: Unopinionated about the dataset and model implementations built on top of it.
- Customization: Custom losses, metrics, scheduling, optimizers, tensorboard; suits all your custom needs.
Pythia can also act as starter codebase for challenges around vision and language datasets (TextVQA challenge, VQA challenge)
No comments:
Post a Comment