Download Spacy models to kernel


#1

I’m testing out using Datalore for some of my work but I’m not sure how to go about using Spacy(popular NLP framework) in Datalore. I need to download model files but not sure how to execute shell commands.

In jupyter I usually run:
!python -m spacy download en

Anyone know what I’m missing to get this to work in Datalore?


#2

You can use the following code:

import subprocess
print(subprocess.getoutput("python -m spacy download en_core_web_sm"))

But it’s not currently possible to install these models this way since this subprocess has no write permissions on site-packages:

error: could not create ‘/opt/anaconda3/envs/datalore-user/lib/python3.6/site-packages/en_core_web_sm’: Read-only file system

I’ll ask the devs if there is a way to support this library.


#3

With the latest update (already deployed) it is possible to install these models. I checked the example from the spacy.io landing page and it works now:

import spacy
#%%
import subprocess
#%%
print(subprocess.getoutput("python -m spacy download en_core_web_sm"))
#%%
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en_core_web_sm')

# Process whole documents
text = (u"When Sebastian Thrun started working on self-driving cars at "
        u"Google in 2007, few people outside of the company took him "
        u"seriously. “I can tell you very senior CEOs of major American "
        u"car companies would shake my hand and turn away because I wasn’t "
        u"worth talking to,” said Thrun, now the co-founder and CEO of "
        u"online higher education startup Udacity, in an interview with "
        u"Recode earlier this week.")
doc = nlp(text)

# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

# Determine semantic similarities
doc1 = nlp(u"my fries were super gross")
doc2 = nlp(u"such disgusting fries")
similarity = doc1.similarity(doc2)
print(doc1.text, doc2.text, similarity)

#4

@igro I confirmed that worked for me. Thanks!