Skip to content

fargolo/TextGraphs.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TextGraphs

Dev Build Status Coverage DOI

Introduction

TextGraphs.jl offers Graphs representations of Text, along with natural language proccessing (NLP) functionalities. Check the white paper including vignettes with examples.

This package is inspired by SpeechGraphs. TextGraphs.jl new features include pre-processing (e.g.lemmas), properties (e.g. centrality) and latent space embeddings (adding latent semantic information to graphs).

Julia uses multiple dispatching, focusing on modular functions and high-performance computing.

No meio do caminho tinha uma pedra. Tinha uma pedra no meio do caminho.


Quick introduction

Check the documentation and the white paper for further information.

See the poster presentation at JuliaCon22:

JuliaCon22 presentation

JuliaCon23 presentation

Install

Install with Pkg.

pkg>add TextGraphs

You should also have R and package udpipe available.

$sudo apt install r-base
$sudo Rscript -e 'install.packages("udpipe")'

Features

Graph types

You can build the following graphs from text (AbstractString):

Raw

  • Naive (naive_graph) uses the original sequence of words.
  • Phrases Graph(phrases_graph): Uses the original sequence of phrases.

POS, Stems and Lemmas

  • Stem (stem_graph) uses stemmed words.
  • Lemma (lemma_graph): Uses lemmatized words.
  • Part of Speech Graph (POS, pos_graph) uses syntactical functions.

Latent space embeddings

  • Latent space embedding (LSE, latent_space_graph) graphs.
  • Latent space embeddings to target (latent_space_graph)

Properties

You can obtain several properties of the graphs:

Direct measures
graph_props returns values of density, # of self loops, # of SCCs, size of largest SCC, and mean centrality (betweeness, closeness and eigenvector methods).

Erdős–Rényi ratios
rand_erdos_props returns values as compared to random Erdõs-Rényi graph with identical number of vertices and edges through z-score or ratio to average.

Usage

julia>using TextGraphs  
julia>naive_graph("Sample for graph")  
{3, 2} directed Int64 metagraph with Float64 weights defined by :weight (default weight 1.0)  
julia>stem_graph("Sample for graph";snowball_language="english") # Optional keyword argument  
{3, 2} directed Int64 metagraph with Float64 weights defined by :weight (default weight 1.0)  
julia> graph_props(naive_graph("Sample for graph"))
Dict{String, Real} with 7 entries:
  "mean_close_centr"        => 0.388889
  "size_largest_scc"        => 1
  "num_strong_connect_comp" => 3
  "density"                 => 0.333333
  "num_self_loops"          => 0
  "mean_between_centr"      => 0.166667
  "mean_eig_centr"          => 0.333335

Plot

using GraphMakie , GLMakie

g = naive_graph("Colorless green ideas sleep furiously")
stem_g = stem_graph("No meio do caminho tinha uma pedra tinha uma pedra no meio do caminho")

g_labels = map(x -> get_prop(naive_g,x,:token), collect(1:nv(naive_g)))
stem_g_labels = map(x -> get_prop(stem_g,x,:token), collect(1:nv(stem_g)))
graphplot(naive_g,nlabels=g_labels)
graphplot(stem_g,nlabels=stem_g_labels)

spec3_layout = Spectral(dim=3)
graphplot(naive_g,node_size=30,nlabels=g_labels,layout=spec3_layout)

Available options

Besides SpeechGraphs, there's a previous object-oriented Python implementation by github/facuzeta.