How to Learn a (Natural) Language

I recently started learning Portuguese, and I wrote a short script to help me practice my vocabulary. It randomly quizzes me in my terminal as I enter commands like cd and ls, allowing me to practice throughout my entire day, little by little!

Motivation

With a little bit of vim trickery on my .zsh_history file, I noticed that I was averaging over 100 commands a day in my terminal. A little more massaging and I saw that most of these commands were in the set $\{\text{ls},\text{cd},\text{git},\text{python},\text{make}\}$. Hmm I thought. I could alias each CMD to something like alias CMD="./precmd_program; CMD" to run some code before they executed. I could make a study program, and this would allow me to spread my studying thinly over the course of many hours. I wouldn’t even feel like I was studying, ideally.

Implementation

I wrote a short node.js script that does the following: 80% of the time, it does nothing. 20% of the time (this is a parameter of course), it prints out “How do you say __ in Portuguese?” and waits until you type in an answer. It keeps asking you questions until you answer $n$ correctly, at which point the original command is allowed to run.

You wouldn’t want this pre-command script to run on every command because it would seriously interrupt your work flow. But if you instead ask many questions ($n$) at a time less frequently, amortized, you’re studying the same amount, but you’re much less annoyed at the interruptions.

This script reads from a CSV-esque file of the following form: question,answer,# correct answers,# wrong answers. For Portuguese vocabulary, this looks something like:

1
2
3
4
5
goodbye,tchau,0,0
yes,sim,0,0
your,seu,0,0
who,quem,0,0
good afternoon,boa tarde,1,0

You first write down all your question/answer pairs (omitting the numbers, they’ll default to 0,0 automatically), one pair per line. Then, as the script asks you questions, it keeps track of how many times you’re missing each question. It quizzes you on questions with low correct/incorrect ratios more frequently than questions with high ratios. Stated differently, it asks you the questions you struggle with more frequently than those you find easy.

For the full source code of this script and some example aliases in a .zshrc/.bash_profile/etc, check out this gist.

Math

Yeah so some math came up as I was implementing this (as math tends to do). Specifically, given an array of questions sorted by their correct/incorrect ratio, how can I randomly select a question, such that the question with the highest ratio is selected $f$ times less frequently than the question with the lowest ratio?

For simplicity, I can map each question to $[0,1)$ by dividing each question index by the number of questions, and I can arbitrarily decide that my PDF, $PDF(x)$, over this interval will be linear. If the intercept is at $(0,y_{max})$, then $PDF(1)=y_{max}/f$ by definition.

Since this is a probability distribution function, we know that the integral from $0$ to $1$ must be $1$, which contrains the value of $y_{max}$. Using this, we can write a formula for our PDF and integrate it to find the CDF, which in this case will be a quadratic function.

Finally, we can take the inverse $CDF^{-1}(x)$ and use inverse transform sampling to sample according to $PDF(x)$. Then, we convert our result, which will be in the range $0$ to $1$, back into a question index and return the corresponding question.

Closing thoughts

This was a fun quick project that will hopefully help me learn Portuguese a lot more easily. I think of this script as forcing me to have brief flash-card study sessions every few minutes. I’d love to hear your thoughts if you try this out and like/dislike it!

Also, just to give a sense of what else you could use this for, it doesn’t just need to be used for questions; another idea I had was to print out one sentence of a book with every terminal command you ran. A lot of people say they “don’t have time to read”, but with something like this, there’s no excuse (effectiveness yet to be tested). Até a próxima vez!