Research
“The frog in the well knows nothing of the great ocean.”
This was the Japanese saying that inspired me to work in Kyoto, Japan for two months as an AI Research Intern. I worked with Cross Labs — a lab dedicated to uncovering the mathematical bases of all intelligent processes. The problem was, I barely knew anything about NNs.
I was just a frog that was proficient in Python, Git, and C++. I knew nothing of the great ocean that was NNs, and I had no hands-on experience coding my own models, so I had to start from the ground up.
The first thing I did to understand the fundamentals of neural networks was reading textbooks and research papers while taking notes. My mentor, Federico Da Rold, PhD, assigned me pages from Artificial Intelligence: A Modern Approach, written by Stuart J. Russell and Peter NorvigIf. If I was going to learn all about NNs, it wasn’t going to be with any shortcuts. I began to study the history, calculus, and “under the hood” methodology of modern AI. With every section read came a new page of notes, and I was determined to achieve a deep understanding of NNs so I could create my own models in the future.
Here, something interesting I read was about the Father of Logic, Aristotle, and his most famous syllogism: “All men are mortal. Socrates is a man. Therefore Socrates is mortal.” This was one of the earliest formal systems of logic, and later became a foundational principle of AI. Since the early stages of AI heavily relied on logic like this to pursue the Turing Test, Aristotle's work played a pivotal role in shaping the field. Amongst all the history I read, this was the most interesting topic, as I didn’t expect to see Aristotle in an AI textbook.
In addition to Aristotle, I took notes on other important AI figures, along with different forms of machine learning and their functions (induction, deduction, classification, regression, supervised, unsupervised). At this point, I also understood the basics of backpropagation and multilayer perceptrons (MLPs), and watched YouTube videos to solidify my findings. In the final chapters I was assigned, I studied evolutionary algorithms and its respective strengths and weaknesses.
After building a rough understanding of the AI field and backpropagation, I set a new goal of coding my first basic NN using this tutorial. It is a simple NN coded with only Python, Numpy, and Matplotlib. It has no hidden layers, with an input layer of two neurons and output layer of one neuron. Thanks to this tutorial, I was able to have a firm grasp of the activation functions, equations used to calculate predictions, and derivatives for calculating gradients. I now understood the structure of a MLP, while getting my first hands-on experience coding a NN. Throughout the internship, I enjoyed writing down my code on whiteboards for memorization purposes.
Before coding another NN, I prepared for future projects by downloading powerful libraries and frameworks such as Scikit, PyTorch, Pandas, etc. Here, I gained proficiency using package management systems such as pip and npm, and I learned how to troubleshoot dependency issues. These skills aren't necessarily related to NNs, but fixing installation issues and learning to use documentation is essential for beginning any project. I also became more comfortable using virtual environments (venvs) and documenting all of my code on GitHub.
Now that I built a solid foundation for understanding NN structures, I moved on to the next project Federico assigned me, which was building a NN (without machine learning frameworks) for the Iris Flower dataset. This multivariate dataset contains 50 samples from each of three species of Iris (Iris Setosa, Iris Virginica, and Iris Versicolor) under 5 attributes: Petal Length, Petal Width, Sepal Length, Sepal width, and Species.
As Einstein famously said, “If you can’t explain it simply, you don’t understand it well enough.”
That’s why along with hand-coding the model for this dataset, I created a Jupyter tutorial with methodologies and step-by-step explanations for the entire process. Many coders jump straight into using machine learning frameworks (PyTorch, TensorFlow, Keras, Theano) without understanding what's happening "under the hood". Nowadays, it's hard to even find the infamous Iris Classification solved without using such tools. It's important we take the time to understand neural network fundamentals.
That's why in this project, I hand-coded a deep learning model using a neural network to identify each species of Iris. The network is a MLP with one hidden layer (two neurons), coded with Python, and 100% accuracy. In the tutorial, I talk about the basic steps: importing libraries, preprocessing, data visualization, building the model, and discussion of results. I coded the model’s backpropagation from scratch using Numpy, and it runs almost instantaneously, due to how small the dataset is.
After understanding the fundamentals of deep neural networks (DNNs) “under the hood”, I began to code another Iris Flower Classification model but using PyTorch this time. I knew that for more complex datasets in the future, PyTorch was an invaluable tool to learn.
By redoing the same MLP with PyTorch, I was able to understand exactly what each Torch function did. I learned how to use PyTorch for tasks like data loading, backpropagation, and building the model. Compared to implementing backpropagation manually with NumPy, PyTorch reduced the lines of code by at least five times. At the end of this short project, I gained an intermediate understanding of PyTorch, along with a newfound appreciation for how efficient it was.
The next model I coded was for the MNIST dataset, a large database of handwritten digits that is commonly used for training various image processing systems. It contains 60,000 training images and 10,000 testing images. For this project, I coded another DNN and MLP with PyTorch. The model has two hidden layers, with the first having 256 neurons and the second with 128. I used ReLU as my activation function, along with various Python libraries. This model had 99% training accuracy, 98% testing accuracy, and 97% validation accuracy.
Coding a DNN for the MNIST dataset was immensely valuable because it bolstered all my pre-existing NN knowledge, while also teaching me new model-building shortcuts. This project increased my proficiency with PyTorch, and I learned how to upscale my previous MLP from the Iris Flower dataset. I could also now utilize Torchvision, so I didn’t have to deal with manually converting data imported from pandas into tensors anymore. Furthermore, I implemented a validation set, which is a subset of the training data used to tune the model and evaluate its performance during training, helping to prevent overfitting and improve generalization.
Overall, this project was fascinating to me because it was my first hands-on experience with computer vision. Additionally, the MNIST model is a standard project for many AI/ML classes in universities, so it’s great I can go into those classes already knowing about the inner workings of NNs. Talk about killing two birds with one stone!
Now that I understood neural networks “under the hood” and was comfortable with DNNs, I was ready to code a convolutional neural network (CNN) for the MNIST dataset. This was the second longest project I undertook during my time at Cross Labs, right behind the Iris Flower Tutorial. The model has a 99% training and testing accuracy.
For the first step, it was back to reading. My mentor, Federico, provided me with this wonderful article introducing me to CNNs, and after routine meetings, I comprehended the basic mechanisms of convolutions, kernels, padding, etc. Just like the other projects, he provided me with examples of code that greatly helped me translate my ideas into working solutions. Watching YouTube videos and writing code on whiteboards also helped me to visualize and memorize the CNN architecture.
After studying CNNs and implementing one into the DNN from my previous MNIST model, I came upon another problem: my computer was going to explode. Every time I started training and testing the CNN, my CPU would constantly be at 100%, along with my laptop’s fan being sent into overdrive. My saving grace in this situation was having a CUDA-compatible GPU installed in my laptop (HP - Envy with a NVIDIA Geforce RTX 3050). CUDA is a program that enables my laptop to utilize its GPU to run models via parallel processing instead of its CPU, which can be up to 40 times faster. The training speed is literally night and day, and my laptop was no longer a threat to the office. CUDA was almost quite literally a lifesaver, and I now understood why it’s such an important skill to have in the AI/ML field.
If I didn’t learn CUDA from the previous MNIST CNN, my laptop would have definitely exploded trying to train the model I created for the CIFAR-10 dataset. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class, and these pictures are far more complex than MNIST’s handwritten digits. For example, MNIST’s images are grayscale, meaning that they only contain shades of gray, therefore only one input channel is needed for the CNN. On the other hand, CIFAR-10’s colored images require RGB (red-green-blue), so three input channels are necessary. As you can imagine, tripling the amount of input channels would increase the complexity of the CNN by quite a bit.
For reference, when I used the same CNN I coded previously but for the CIFAR-10 dataset, my overall accuracy dropped from 99% to 65%. Despite adding dropout layers, normalization layers, more convolutions, and training the model for 100 epochs, I was still only able to achieve 87.5% training accuracy and 83.17% testing accuracy. Compared to the 99% accuracy I was used to, this dip was a little demotivating. However, I learned that even state-of-the-art models such as VGG and ResNet sit at around 90 - 96% accuracy for CIFAR-10.
Above is a confusion matrix I generated to visualize the results of my CIFAR-10 model after 100 epochs. The y-axis represents the true classes, while the x-axis is the predictions for each class. Essentially, each box represents how often a class was labeled as another class, so theoretically, a model with 100% accuracy would have all 1’s diagonally, with 0’s in every other box. As we can observe from my confusion matrix, my model had an especially difficult time identifying cats, with a disappointing 60% accuracy for that class. Moreover, two boxes to the right of that 0.6, we can see that my model classified 16% of cats as dogs.
After staying in Kyoto for two months, I traveled back to Texas and decided to continue working remotely with my mentor, Federico, until my next college semester at UT Austin starts. Currently, I’m studying the architecture of more advanced NNs, such as VGG and ResNet. After that, I aim to study genetic algorithms using OpenAI Gymnasium. In the future, I hope to research neuroevolution and evolutionary algorithms, and join an AI research lab at UT Austin now that I’m equipped with all my NN knowledge. Maybe one day, I’ll be able to use NNs and apply them to spatial computing. Thanks for reading!