How I study languages with my bot

For a while I was planning to write a series of blog posts about how a developer can influence own lifestyle and build positive habits by implementing bots. Implementing a chat bot can relax a requirement of designing an expensive UI and for more backend-oriented developers that is a meaningful relief. Unlike most of the instant messenger platforms such as WhatsApp or WeChat which are afraid of chat bots flooding their user base, Telegram messenger took a different approach and is embracing the possibility of introducing chat bots in their system. Creators of Telegram made the UI as rich as possible with all the custom keyboards, inline commands and many more features. Until I started my first Software Engineering job in Hong Kong back in the year 2015 I was not yet aware of possibilities of such bots. This is when I was highly motivated to learn Cantonese language and I developed my first Telegram bot - Bruce. Here below are screenshots I took between January 24th - 28th, 2016. Over five years before I started to write this article!

Screenshots of the old Telegram bot for flashcards study

My study of the Cantonese language has been a failure. It was too heavily based on using phrases from the 1991’s Hong Kong edition of the 1988’s book Colloquial Cantonese and Putonghua Equivalents by Zeng Zifan and S. K. Lai. Literally not a single Cantonese/Mandarin speaker could understand examples from this book ergo having Bruce teaching them to me became pointless. I have abandoned my goals of becoming a Cantonese speaker in the same way as I abandoned my life in Hong Kong. It remained nothing more than a nostalgic memory recalled while hearing YouTube ads in Cantonese.

Couple of years later after I moved to New York City I regained motivation for studying languages. I gave Duoling a chance. The cute bird and silly-looking (in a good way) gamified interface have dragged my attention. I began to study Russian, Mandarin and Klingon. I had a rough relationship with this overly sadistic bird, we did not get a long and my progress was not significant even given that I completed a year long streak with my Mandarin, on August 24th, 2020 I have abandoned it and decided to make the Duo sad.

This is also when I decided to reactivate Bruce project. The initial version was poor in features. He was just pulling out random words from its database. I wanted something more sophisticated, such as SuperMemo98 algorithm. The SM-2 algorithm has proven its usability to me when I first moved to France back in 2009 and I had to learn French super fast. Back then, by the means of Mnemosyne I managed to memorise around 100 words per day and 2-3 months later I was capable of basic communication in French. Another useful software which came along was Anki which I used to study for my (failed) Japanese Language Profinciency Test in 2010. In fact, I was one of those people who were refusing to get a smartphone back then and AnkiDroid was what has changed my mind as it allowed me to review my cards anywhere I went without opening my ThinkPad T61.

While writing this article, I live in Shanghai, China. Basic communication skills in Mandarin are required to survive. After abandoning Duo I had to find a new tool and I do find the pricing of AnkiMobile for iOS extremely unfair I decided to give Bruce another chance. Studying the flash cards via Telegram makes it truly possible to study anywhere you go as Telegram has clients for many platforms. It also permits to practice Chinese by typing words and not only look at cards. In case of the flash cards, it has huge benefit to respond by typing the answer as people tend to confuse the recognition with recollection. There is an excellent lecture Study Less Study Smart by Marty Lobdell where at 35:00 he explains that.

This blog is very theory-oriented and even with the fact that I have not implemented the SM-2 algorithm from scratch and used the supermemo2 Python module I find it beneficial to explain briefly how this algorithm operates. Every time you see a card with a question and decide to flip it and reveal the answer, you will have to attribute yourself a grade $q \in \{0, 1, 2, 3, 4, 5\}$ . The creator of SuperMemo, Piotr Woźniak clearly comes from Poland as this is the grading we use in Polish schools. There is a specific way of interpreting those grades, lets go through them one by one. The grades $q < 3$ are failing grades and from lowest to highest should be interpreted as total blackout, incorrect response, but upon seeing the correct answer it felt familiar then incorrect response, but upon seeing the correct answer it seemed easy to remember. Remaining grades are passing and we should interpret them as correct response, but required significant difficulty to recall, correct response, after some hesitation and correct response with perfect recall. Once you provide the grade, the SM-2 algorithm will calculate three new parameters - $n$ the repetition number, $E$ the easiness factor and $I$ the interval (in days) which is the most important to us as it will indicate when we should review the card again. As long as you are progressing, the $I$ coefficient should increase. The way the coefficients are calculated is

\left<n^\prime, E^\prime, I^\prime \right> = \begin{cases} \left<n+1, E^\prime, 1 \right> & \text{if}\ q \geq 3 \land n=0 \\ \left<n+1, E^\prime, 6 \right> & \text{else if}\ q \geq 3 \land n=1 \\ \left<n+1, E^\prime, I \times E \right> & \text{else if}\ q \geq 3 \land n > 1 \\ \left<0, E, 1 \right>& \text{incorrect response} \end{cases}

and the updated easiness factor $E^\prime$ is

E^\prime = \begin{cases} f(E, q) & \text{if}\ f(E, q) \geq 1.3 \\ 1.3 & \text{otherwise} \end{cases}

dependent on

f(E, q) = E + (0.1 - (5 - q) \times (0.08 + (5 - q) \times 0.02))

Not very theoretical, seems like bunch of formulas picked up from high number of trials. But trust me, it works. It works very well. It makes you save a lot of revision time by not staring at same words over and over again.

Implementing new study bot with the SM-2 algorithm under a very rough estimate took me 121 hours of development work time according to the git-hours plugin. The new interface is user friendly.

Screenshots of the new updated study bot

Another new feature, available only for cards that have been reviewed and are scheduled for repetition in the future are quizes. It has a lot to do with the difference between the recognition and the recollection. I do not believe a quiz could be useful to learn a new concept, yet doing such quiz for already seen concepts can stimulate our brain and make us feel like the concept is being used.

Screenshots showing the quiz feature of the study bot

Picture questions can also bring a lot of utility to the end user. For instance as displayed below, they permit to learn reading music with spaced repetition. The cards are random with increased difficulty (measured by the size of the domain) and are pre-generated using abjad).

Screenshots showing the solfége feature of the study bot

Was it worth it? Definitely. I can study cards anywhere I go. It fills out time which is normally wasted, such as standing on the subway or waiting in a queue somewhere. Responding in Chinese to Bruce is extremely useful feature, just few weeks later I became a lot more fluent with typing Chinese using pinyin.

It has been a lot lighter than usual. I want to do more of the less formal writing on this blog. I hope this post puts me on this direction. I am an advocator and enthusiast of autodidacticism and I would be willing to write more about it if there is such interest. I have not revealed the source-code of Bruce, I am too afraid people would start using it and I might have a lot of maintenance work to do. I would consider sharing it if there is more people interested. You are welcome to Tweet me and we can talk about it!

[Back to Top]