leadershop

🎓 marksheet-information-extraction-api - Extract Information Effortlessly from Marksheet Images

👋 Introduction

Welcome to the marksheet-information-extraction-api! This is a powerful backend service built with FastAPI. It helps you extract structured information from academic marksheets, whether they are in image or PDF format. The service uses Optical Character Recognition (OCR) combined with a large language model (LLM) to give you accurate results in JSON format, along with confidence scores for each piece of information.

🚀 Getting Started

Here’s how you can easily get started:

Download the Application
- To begin, you will need to download the application. Just visit the Releases page to get the latest version.
Install Required Software
- You may need to install additional software to run the application smoothly. Typically, you should have:
  - Python 3.6 or newer installed on your machine.
  - Docker, if you prefer to run the application in a containerized environment.
Setting Up the Environment
- If you are using Python, create a virtual environment for easy management of dependencies:
```
python -m venv myenv
source myenv/bin/activate  # On Windows use: myenv\Scripts\activate
```
- Then, install the required packages:
```
pip install -r requirements.txt
```
Run the Application
- If you downloaded a standalone version:
  - Follow the provided instructions in the zip or installer package.
- If you are using Docker, you can run the following command:
```
docker-compose up
```
Accessing the API
- By default, the API runs on http://localhost:8000. Open this address in your web browser to check its status and see the documentation.

📦 Download & Install

You can get the latest version of the marksheet-information-extraction-api from the Releases page. Follow the instructions on that page to download the appropriate file for your operating system.

Once you have downloaded the application, follow the setup instructions outlined in the “Getting Started” section.

🔍 Features

Image and PDF Support: Extract information from both images and PDFs of marksheets.
High Accuracy: Utilizes advanced OCR technology for accurate extraction.
Structured JSON Output: Receive your data in a normalized JSON format that is easy to use.
Confidence Scores: Get confidence scores for each extracted data point, helping you gauge reliability.
Multi-Platform Compatibility: Runs on various operating systems like Windows, macOS, and Linux.

💻 System Requirements

To run the application smoothly, ensure your system meets the following requirements:

Operating System: Windows, macOS, or Linux.
Memory: At least 4GB of RAM.
Storage: Minimum of 200MB free space for the application and data.
Python Version: 3.6 or newer, if running natively.

🔧 API Documentation

Once the API is running, you can access its documentation at http://localhost:8000/docs. This interface will guide you through the available endpoints, request parameters, and response formats.

📄 Example Usage

To use the API, send a POST request to the /extract endpoint with your marksheet image or PDF. You can use tools like Postman or curl for this purpose.

Example using curl:

curl -X POST "http://localhost:8000/extract" -F "file=@path_to_your_marksheet.pdf"

The API will return a JSON response with extracted data and confidence scores.

👥 Community & Support

If you have questions or need help, feel free to open an issue in the GitHub repository. Our community and maintainers will be happy to assist you.

🚧 Known Issues

Occasionally, OCR may not correctly read handwritten marksheets. If you encounter issues, please report them on the GitHub issues page.
Ensure good image quality for optimal extraction results.

🎉 Acknowledgements

Thanks to the contributors of FastAPI, EasyOCR, and everyone involved in the development of this project. We appreciate your support!