Skip to content

Commit

Permalink
count mappings and ontology items
Browse files Browse the repository at this point in the history
  • Loading branch information
jcsahnwaldt committed Aug 4, 2012
1 parent e483cc7 commit 8f11bac
Showing 1 changed file with 20 additions and 1 deletion.
21 changes: 20 additions & 1 deletion scripts/src/main/bash/oneliners.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,23 @@ find -maxdepth 2 -mindepth 2 -type d | env LC_ALL=C sort | awk -F / '{sub(/wiki/
gzip -d < enwiki/20120601/enwiki-20120601-interlanguage-links-same-as.ttl.gz | grep -v -E 'resource/(Category|Template):' | wc -l

# count number of abstracts, i.e. non-redirect, non-disambig pages in Wikipedia article namespace, in all extracted languages. Result in 3.8: 20805392
grep 'short-abstracts\.ttl' lines-bytes-packed.txt | awk '{s+=$3} END {print s}'
grep 'short-abstracts\.ttl' lines-bytes-packed.txt | awk '{s+=$3} END {print s}'

# count mappings (per language and total)
grep -c '<page>' mappings/Mapping_* | sed 's/[_:.]/ /g' | sort -k 4 -n -r | awk '{s+=$4; print $2 " " $4} END {print "total " s}'

# ontology: count classes
grep -c '<owl:Class' dbpedia_3.*

# ontology: count object properties
grep -c '<owl:ObjectProperty' dbpedia_3.*

# ontology: count datatype properties, not specialized
grep -c -E '<owl:DatatypeProperty rdf:about="http://dbpedia.org/ontology/\w+">' dbpedia_3.*

# ontology: count datatype properties, not specialized
grep -c -E '<owl:DatatypeProperty rdf:about="http://dbpedia.org/ontology/\w+">' dbpedia_3.*

# ontology: count specialized datatype properties
grep -c -E '<owl:DatatypeProperty rdf:about="http://dbpedia.org/ontology/\w+/\w+">' dbpedia_3.*

0 comments on commit 8f11bac

Please sign in to comment.