These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Grasp
Grasp is a lightweight AI toolkit for Python, with tools for data mining, natural language processing (NLP), machine learning (ML), and network analysis. It has 300+ fast and essential algorithms, with ~25 lines of code per function, self-explanatory function names, and no dependencies, bundled into one well-documented file: grasp.py (200KB).
Grasp is developed by Textgain, a language tech company that uses AI for societal good.
Tools for Data Mining
Download stuff with download(url)
(or dl
), with built-in caching and logging:
src = dl('https://www.textgain.com', cached=True)
Parse HTML with dom(html)
into an Element
tree and search it with CSS Selectors:
for e in dom(src)('a[href^="http"]'): # external links
print(e.href)
Strip HTML with plain(Element)
to get a plain text string:
for word, count in wc(plain(dom(src))).items():
print(word, count)
Find articles with wikipedia(str)
, in HTML:
for e in dom(wikipedia('cat', language='en'))('p'):
print(plain(e))
Find opinions with twitter.seach(str)
:
for tweet in first(10, twitter.search('from:textgain')): # latest 10
print(tweet.id, tweet.text, tweet.date)
Deploy APIs with App
. Works with WSGI and Nginx:
app = App()
@app.route('/')
def index(*path, **query):
return 'Hi! %s %s' % (path, query)
app.run('127.0.0.1', 8080, debug=True)
Once this app is up, go check http://127.0.0.1:8080/app?q=cat.
Tools for Natural Language Processing
Find language with lang(str)
for 40+ languages and ~92.5% accuracy:
print(lang('The cat sat on the mat.')) # en
Find words & sentences with tok(str)
(tokenize) at ~125K words/sec:
print(tok("Mr. etc. aren't sentence breaks! ;) This is:.", language='en'))
Find word polarity with pov(str)
(point-of-view). Is it a positive or negative opinion?
print(pov(tok('Nice!', language='en'))) # +0.6
print(pov(tok('Dumb.', language='en'))) # -0.4
- For de, en, es, fr, nl, with ~75% accuracy.
- Youβll need the language models in grasp/lm.
Find word types with tag(str)
in 10+ languages using robust ML models from UD:
for word, pos in tag(tok('The cat sat on the mat.'), language='en'):
print(word, pos)
- Parts-of-speech include
NOUN
,VERB
,ADJ
,ADV
,DET
,PRON
,PREP
, β¦ - For ar, da, de, en, es, fr, it, nl, no, pl, pt, ru, sv, tr, with ~95% accuracy.
- Youβll need the language models in grasp/lm.
Tools for Machine Learning
Machine Learning (ML) algorithms learn by example. If you show them 10K spam and 10K real emails (i.e., train a model), they can predict whether other emails are also spam or not.
Each training example is a {feature: weight}
dict with a label. For text, the features could be words, the weights could be word count, and the label might be real or spam.
Quantify text with vec(str)
(vectorize) into a {feature: weight}
dict:
v1 = vec('I love cats! π', features=['c3', 'w1'])
v2 = vec('I hate cats! π‘', features=['c3', 'w1'])
c1
,c2
,c3
count consecutive characters. Forc2
, cats β 1x ca, 1x at, 1x ts.w1
,w2
,w3
count consecutive words.
Train models with fit(examples)
, save as JSON, predict labels:
m = fit([(v1, '+'), (v2, '-')], model=Perceptron) # DecisionTree, KNN, ...
m.save('opinion.json')
m = fit(open('opinion.json'))
print(m.predict(vec('She hates dogs.')) # {'+': 0.4: , '-': 0.6}
Once trained, Model.predict(vector)
returns a dict with label probabilities (0.0β1.0).
Tools for Network Analysis
Map networks with Graph
, a {node1: {node2: weight}}
dict subclass:
g = Graph(directed=True)
g.add('a', 'b') # a β b
g.add('b', 'c') # b β c
g.add('b', 'd') # b β d
g.add('c', 'd') # c β d
print(g.sp('a', 'd')) # shortest path: a β b β d
print(top(pagerank(g))) # strongest node: d, 0.8
See networks with viz(graph)
:
with open('g.html', 'w') as f:
f.write(viz(g, src='graph.js'))
Youβll need to set src
to the grasp/graph.js lib.
Tools for Comfort
Easy date handling with date(v)
, where v
is an int, a str, or another date:
print(date('Mon Jan 31 10:00:00 +0000 2000', format='%Y-%m-%d'))
Easy path handling with cd(...)
, which always points to the scriptβs folder:
print(cd('kb', 'en-loc.csv')
Easy CSV handling with csv([path])
, a list of lists of values:
for code, country, _, _, _, _, _ in csv(cd('kb', 'en-loc.csv')):
print(code, country)
data = csv()
data.append(('cat', 'Lizzy'))
data.append(('cat', 'Polly'))
data.save(cd('cats.csv'))
Tools for Good
A big concern in AI is the bias introduced by human trainers. Remember the Model
trained earlier? Grasp has tools to explain how & why it makes decisions:
print(explain(vec('She hates dogs.'), m)) # why so negative?
In the returned dict, the modelβs explanation is: βyou wrote hat + ate (hate)β.
About the tool
You can click on the links to see the associated tools
Developing organisation(s):
Tool type(s):
Objective(s):
Purpose(s):
Country of origin:
Type of approach:
Target users:
Required skills:
Programming languages:
Github stars:
- 56
Github forks:
- 15
Use Cases
Would you like to submit a use case for this tool?
If you have used this tool, we would love to know more about your experience.
Add use case