MindSync: Deep dive · Tayyab Ahmed

Overview

MindSync started with a pretty simple question: can you actually get anything meaningful about emotional state from EEG, especially in messy, real-world conditions? Not lab-perfect signals, but the kind you’d get in a student setup, short recordings, inconsistent contact, and labels that aren’t always reliable.

I quickly realized this wasn’t going to be about finding the “best model.” It was more about building a pipeline that could handle everything from raw signals to something I could actually evaluate and trust.

What made it difficult

EEG is just messy. There’s noise everywhere, movement artifacts, drift, and inconsistencies between sessions and people. Early on, I saw models perform well, but it didn’t feel real. Once I looked closer, I realized the splits were leaking information across sessions, so the model was basically memorizing patterns tied to specific recordings.

On top of that, the labels themselves weren’t great. Emotional state is subjective, and the data was imbalanced, so accuracy alone didn’t really mean anything.

Approach

I treated preprocessing as part of the core system, not just a setup step. Filtering, windowing, normalization, all of it was something I experimented with and documented.

The modeling side was built in PyTorch, keeping things flexible so I could iterate quickly. I wanted to make sure that whenever something improved or broke, I could actually trace it back to either the data, the model, or the evaluation.

Iteration and debugging

At first, I was chasing metrics. Then I realized that wasn’t enough. The model was making confident but completely wrong predictions on noisy segments, which forced me to rethink how I was evaluating things.

I started plotting errors, comparing performance across sessions, and tightening how the data was split. Some augmentations that seemed like a good idea actually made things worse, especially since label noise was already a limiting factor.

Evaluation

Instead of focusing on one number, I looked at where the model failed. Performance by session, by class, and by signal quality told a much clearer story. It became obvious that some parts of the data were just harder, and pretending otherwise didn’t help.

Takeaways

The biggest thing I learned is that in this kind of ML, the pipeline matters more than the model. You can swap architectures easily, but if your data handling and evaluation aren’t solid, the results don’t mean much.

This project really changed how I think about ML, especially when working with noisy, real-world data.