Download data in source code


#1

Hello,

I’m testing Datalore and working with Colab. There I could simply execute the following code:

#nltk
nltk.download(‘stopwords’)
stopword_list = stopwords.words(‘english’)

#spacyce
spacy.cli.download(“de_core_news_sm”)
nlp = spacy.load(‘de_core_news_sm’)

This does not seem to go directly here. Is there an alternative way to do this?


#2

Hello!

First of all, you have to install nltk and spacy using “Tools -> Library Manager”. Installing spacy may take a couple of minutes.

Unfortunately spacy.cli.* commands currently don’t work because of a bug. We’ll deploy a fix in a few days, thank you for bringing our attention here!

Currently you can download spacy modules using subprocess. Full code of your example:

import nltk
nltk.download('stopwords')
#%%
from nltk.corpus import stopwords
stopword_list = stopwords.words('english')
#%%
import spacy
import subprocess
print(subprocess.getoutput("python -m spacy download de_core_news_sm"))
#%%
nlp = spacy.load('de_core_news_sm')