Welcome to the marksheet-information-extraction-api! This is a powerful backend service built with FastAPI. It helps you extract structured information from academic marksheets, whether they are in image or PDF format. The service uses Optical Character Recognition (OCR) combined with a large language model (LLM) to give you accurate results in JSON format, along with confidence scores for each piece of information.
Hereβs how you can easily get started:
python -m venv myenv
source myenv/bin/activate # On Windows use: myenv\Scripts\activate
pip install -r requirements.txt
docker-compose up
http://localhost:8000. Open this address in your web browser to check its status and see the documentation.You can get the latest version of the marksheet-information-extraction-api from the Releases page. Follow the instructions on that page to download the appropriate file for your operating system.
Once you have downloaded the application, follow the setup instructions outlined in the βGetting Startedβ section.
To run the application smoothly, ensure your system meets the following requirements:
Once the API is running, you can access its documentation at http://localhost:8000/docs. This interface will guide you through the available endpoints, request parameters, and response formats.
To use the API, send a POST request to the /extract endpoint with your marksheet image or PDF. You can use tools like Postman or curl for this purpose.
Example using curl:
curl -X POST "http://localhost:8000/extract" -F "file=@path_to_your_marksheet.pdf"
The API will return a JSON response with extracted data and confidence scores.
If you have questions or need help, feel free to open an issue in the GitHub repository. Our community and maintainers will be happy to assist you.
Thanks to the contributors of FastAPI, EasyOCR, and everyone involved in the development of this project. We appreciate your support!