We can probably all agree that outreach is an integral albeit oft-neglected part of our teaching and research lives, because not only does it generate interest in the broader community around us, but also it is great for promoting diversity in our research environments down the line. However, as I have come to realize, this is not as exactly easy as it seems to be, especially when you are talking to a young and less technical audience.

A while ago, I was invited to give an introduction to deep learning for natural language processing to highschoolers participating in a hackathon called LingHacks. This was my first time teaching technical material to such a young audience—one that turned into an interesing learning experience for me! I’ve recently come to realize what I have learned preparing for this presentation could be of interest more broadly, so I am writing it down to share my experiences.


LingHacks is a new hackathon focused on computational linguistics (a.k.a. natural language processing) for highschool students, where they spend one day in training (which I was part of), and two days to flesh out their brilliant ideas applying language technology to solving real-world problems.

Before I started preparing my slides, I asked the organizers about my audience’s mathematical background. Here is a brief list of things I got from them that would hopefully be useful for others preparing for similar events:

Items crossed out are ones I was told highschoolers are not supposed to know of by default. I didn’t end up using most of these anyway—I’ll explain why in a bit.

My Initial Plan and What (probably) Not To Do

Skip if you just want to know what I ended up including in my slides! Skipping this section wouldn’t affect your takeaways in any way.

As you might have inferred from the list of mathematical prerequisites I asked the organizers for, I was going for a pedantic introduction leading up from “shallow learning” (linear classifiers) to deep learning (neural networks). I was planning to follow some of the great tutorials out there, and tailor them to my audience’s technical level by introducing the following things (in this order):

This agenda made perfect sense to me at the time, especially since this is the path my past teaching experiences usually took, which worked reasonably well for classrooms filled with undergraduate and graduate students.1 Fortunately, though, my labmates heard about my plan and stopped me about 20 slides down this path.

An image depicting the surface representing a two-dimensional function in three dimensions. A line traces out the gradient descent direction from one point to another, which illustrates how gradient descent follows the steepest direction of descent on the loss surface.
Figure 1: What gradient descent on a loss function looks like. (From Prof. Andrew Ng's machine learning course on Coursera)

What Was Actually Covered in My Slides, and Why

After sharing my plan with my labmates, they quickly pointed out (and I paraphrase, as they were too kind to say this) that I made a big mistake in assuming that what seems to me a natural progression of topics is also going to be intuitive to my audience. Specifically, what I planned is only natural to me because this is the order how I learned them, and that I have spent many more years with linear algebra and multivariate calculus for things to be more intuitive for me. In the eyes of my intended audience, I might as well be teaching them college maths in the first 5-10 minutes of my talk.

This revelation greatly helped me take a step back to really think from the perspective of my audience, and I hope I have done so in a reasonable manner in the end. I have summarized a few points of dos and don’ts for quick reference below, that reflect the rationales behind what I decided to put in my final slides.

A figure displaying Gru and the Minions from the animation film Despicable Me. Here, Gru is likened to a CPU, who is versatile but limited in number; the Minions are likened to a GPU, who are not terribly powerful individually, but can achieve a lot in parallel.
Figure 2: My attempt at explaining the differences between CPUs and GPUs with* Despicable Me *characters. (One is cunning but really only has so much bandwidth; the other is an army of less sophisticated agents that can achieve a lot in parallel.)

I sincerely hope that these notes can help make your next presentation to a less technical audience an engaging and successful one!

My Slides

My slides are available on Google Drive under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).


I wouldn’t have been part of this experience in the first place without the invitation from LingHacks organizers, so thank you for inviting me!

The final slides wouldn’t have been the way they are right now without the great suggestions from my labmates, Siva Reddy and Urvashi Khandelwal, to whom I’m immensely grateful!

Also special thanks to Yuhao Zhang, Siva, and Urvashi for providing comments and feedback on earlier drafts of this post.


  1. I taught a few review sessions before exams, which usually involved going over topics in the way they were unfolded in class (each the prerequisite to understand the next in most cases), highlighting a few points with examples illustrating all the technical details, and a lot of interactions with the audience [1,2,3]. Some of my internal tutorials/talks were also structured similarly, as you would expect a technical talk to be. 

  2. I have fond recollections of a bunch of Stanford NLPers surrounding a fire pit by the sea talking about “adding layers (of firewood)”, “feeding in more data (newspapers)”, “adding more compute and more supervision (fanning in more air)”, and “doing architectural search (of firewood)”. These references probably make no sense to you if you don’t work with neural networks/deep learning, and that’s okay. 

  3. I have only tried this once and it seemed to work okay, thus the “n=1” note about sample size.