A self-hosted search engine for documents.
-
Updated
Jun 12, 2024 - Java
A self-hosted search engine for documents.
RAG with LM studio, local LLMs, Scientific PDF text extraction,
AI Media and Misinformation Content Analysis Tool: Analyze text and images
A very simple news crawler with a funny name
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Text extraction is the process of automatically extracting text from images or documents. Optical Character Recognition (OCR) is a technology that enables computers to convert images of text into machine-readable text.
NLP预/后处理工具。
PDF text data extraction web app with OCR for scanned documents
Fan translation tools for SCUMM engine games
This repository contains code for a simple application to detect text from images using Pythonracter Recognition (OCR), and Streamlit for creating a user-friendly web application. The application allows users to upload images or capture them via camera input and extracts text present
Extract embedded metadata from HTML markup
Golang PDF library for creating and processing PDF files (pure go)
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Case study using dotfurther's Open Discover Platform with the RavenDB document store to rapidly create a full-text search/eDiscovery/information governance capable demonstration application.
Get text content from any file
Translate visual novels in real time
Module for automatic summarization of text documents and HTML pages.
This GitHub repository hosts the notebooks and tools developed as part of this thesis to automate the extraction, processing, and analysis of data from the MICCAI 2023 conference, aiding in the systematic review and providing a structured foundation for further research in this crucial area.
Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."