stata_translate — Translate text variables in Stata using Google Translate API
stata_translate varlist
stata_translate takes multiple string variable (varlist) from your dataset and uses Google Translate API:
- Automatically detect the language of each row.
- Translate each row's text to English.
- Save the results in two new variables:
- *_
translatedfor the English translation - *_
srclangfor the detected source language
- *_
The command uses Stata’s Python integration and assumes that:
- The translation service is online Google Translate API
- The model
llama3.2:latestfor local / offline usage will be developed later usingollama
clear
input str40 city_description str40 food_review
París es una ciudad hermosa" "Le fromage est délicieux"
Berlin ist großartig" "Ich liebe Bratwurst"
東京はとても忙しいです" "寿司はとても新鮮です"
end
stata_translate city_description food_review
list city_description food_review
This will create two new variables:
*_translated— English translation*_srclang— Source language (e.g., 'de', 'fr', 'id')
To use this command, you must:
- Have Python configured in Stata (see
python query). The easiest way is to install Anaconda or Miniconda to your environment (seehttps://www.anaconda.com/download) - Install the following Python libraries: (this version of stata_translate automatically install these packages as long as you have Python environment installed)
langdetect(pip install langdetect)deep_translator(pip install deep_translator)
Written by @akirawisnu during his PhD depression break.
Feedback or improvements welcome!
- Manual:
help python,help generate,help string functions - Online: https://ollama.com, https://pypi.org/project/langdetect, https://www.anaconda.com/download