La Coctelera

¿qué hacen las redes?

e-[investigaciones de la comunicación]

Herramientas gratuitas para recuperación y análisis de textos

La autora Lisa Spiro agradece sugerencias y modificaciones. De su magnífico wiki Digital Research Tools, DiRT selecciono sólo las aplicaciones gratuitas que operan sobre y con software libre

AntConc 3.2.1: concordance program, "can generate KWIC concordance lines and concordance distribution plots...has tools to analyze word clusters (lexical bundles, n-grams, collocates, word frequencies, and keywords" (Free, Windows/Mac OS X/Linux)

CATMA (Computer Aided Textual Markup and Analysis): " an open source software with a focus on textual markup and analysis." (Open source)

Coding Analysis Toolkit (CAT): a platform-independent system which consists of a web-based suite of tools custom built to facilitate efficient and effective analysis of text datasets that have been coded using either an internal coding module or the commercial-off-the-shelf package ATLAS.ti; CAT computes interrater reliability and supports adjudication and validty measurement (Free, web-based)

Cypher: a "software program available which generates the RDF graph and SPARQL/SeRQL query representation of a plain language input, allowing users to speak plain language to update and query semantic databases...With robust definition languages, Cypher's grammar and lexicon can quickly and easily be extended to process highly complex sentences and phrases of any natural language, and can cover any vocabulary" (Free, Windows/Mac OS X/Linux)

Data for Research (DfR): Mine and analyze JSTOR's collections. Supports fielded searching; provides ngrams, word frequencies, citations, and tag clouds of key terms; offers API for "content selection and retrieval." (Free, web-based)

HyperPo: "a user-friendly text exploration and analysis program"; supports word frequencies, KWIC (Keyword in Context), cooccurrence and distribution lists, comparison, etc. (Free, web-based)

ICTA: "a web-based system for Automated Text Analysis and Discovery of Social Networks from text. It was originally designed to work with email-based and forum-based data. But it can also be used to analyze other types of electronic communication such as blogs and chats." (Free, web-based)

JGAAP: "Java-based, modular program for textual analysis, text categorization, and authorship attribution" (Free)

MALLET: "a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.: (open source using the Common Public License, Java-based)

MONK: "a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study. It supports both micro analyses of the verbal texture of an individual text and macro analyses that let you locate texts in the context of a large document space consisting of hundreds or thousands of other texts" (Free, web-based)

MorphAdorner: "a Java command-line program which acts as a pipeline manager for processes performing morphological adornment of words in a text...Currently MorphAdorner provides methods for adorning text with standard spellings, parts of speech and lemmata. MorphAdorner also provides facilities for tokenizing text, recognizing sentence boundaries, and extracting names and place." (Free, cross-platform)

PAIR (Pairwise Alignment for Intertextual Relations): "a simple implementation of a sequence alignment algorithm for humanities text analysis designed to identify "similar passages" in large collections of texts. These may include direct quotations, plagiarism and other forms of borrowings, commonplace expressions and the like." (Open source, Mac/Linux)

PhiloLogic: "primary full-text search, retrieval and analysis tool developed by the ARTFL Project and the Digital Library Development Center (DLDC) at the University of Chicago"; support for TEI, DocBook, & plain text (Free, Mac/Linux)

SEASR: tools & frameworks for sharing data and research (including text analysis) in virtual work environments (Free; open source, Windows/Mac/Linux)

TagCrowd: "a web application for visualizing word frequencies in any user-supplied text by creating what is popularly known as a tag cloud or text cloud...Create your own tag cloud from any text to visualize word frequency." (Free, web-based)

TAMS Analyzer: "an open source qualitative package for the analysis of textual themes. It can be used for transcribing digital media and for conducting discourse analysis in the social and cultural sciences." (Free; open source, Mac/Linux)

TAPoR Tools: a searchable list of tools available through the Text Analysis Portal for Research that can be used online. TAPoR is "a gateway to tools for sophisticated analysis and retrieval, along with representative texts for experimentation...manage electronic texts, experiment with online text tools, [and] learn about digital textuality." The TAPoRware tools are also available separately. (Free, web-based)

TAToo (Text Analysis for me Too): "a Flash widget that you can embed in web pages to call basic text analysis tools from the TAPoR project." (Free, web-based)

Textpresso: "a text-mining system for scientific literature. Textpresso's two major elements are (1) access to full text, so that entire articles can be searched, and (2) introduction of categories of biological concepts and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., methods, etc)" (Open source, Linux)

TextSTAT - Simple Text Analysis Tool: "a simple programme for the analysis of texts. It reads ASCII/ANSI texts (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as you want from a particular website and puts them in a TextSTAT-corpus." (Free; cross-platform, via New History Lab)

Token-X: "text visualization, analysis, and play tool" (Free, web-based)

Vivisimo/Clusty: web search and text clustering engine (see e.g. Shakespeare Searched) (Free, web-based)

Visual Text: "integrated development environment for building information extraction systems, natural language processing systems, and text analyzers" (Free for academic use)

Voyeur: text analysis suite (Creative Commons, web-based)

Whatizit: "a text processing system that allows you to do textmining tasks on text. The tasks come defined by the pipelines in the drop down list of the above window and the text can be pasted in the text area." Focused on biosciences. (Free, web-based)

Word Hoard: "applies to highly canonical literary texts the insights and techniques of corpus linguistics, that is to say, the empirical and computer-assisted study of large bodies of written texts or transcribed speech. In the WordHoard environment, such texts are annotated or tagged by morphological, lexical, prosodic, and narratological criteria" (Free; open source, cross-platform)

Wordle: a tool for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes. (Free, web-based)

XAIRA: A text analysis and indexing system designed for large scale XML encoded texts including but not limited to TEI-conformant language corpora. (Open source, now has platform-independent PHP interface as well as Windows client)

Escribe un comentario