Aleph is a tool for indexing large amounts of both documents (PDF, Word, HTML) and structured (CSV, XLS, SQL) data for easy browsing and search. It is built with investigative reporting as a primary use case. Aleph allows cross-referencing mentions of well-known entities (such as people and companies) against watchlists, e.g. from prior research or public datasets.

Here are some key features:

  • Web-based search across large document and data sets.
  • Imports many file formats, including popular office formats, spreadsheets, email and zipped archives. Processing includes optical character recognition, language and encoding detection and named entity extraction.
  • Load structured entity graph data from databases and CSV files. This allows navigation of complex datasets like companies registries, sanctions lists or procurement data. Import tools for OpenSanctions. are included.
  • Receive notifications for new search matches with a personal watchlist.
  • OAuth authorization and access control on a per-source and per-watchlist basis.

Download && Tutorial

Source: https://github.com/alephdata/

Loading

Categories: Tools