Detecting Sign Language in Real-Time: A Deep Learning Project

Sign language is a crucial means of communication for millions of people around the world. Building technologies that can interpret sign language can greatly enhance accessibility and inclusion. In this project, we developed a real-time sign language detection system using computer vision and deep learning techniques. We focused on three common signs: "Thank You," "I Love You," and "Hello."

Github link repository

Data Collection and Preprocessing

To train our model, we captured body landmarks in real-time using OpenCV and the MediaPipe library. We recorded 30 frames for each sign, resulting in a total of 90 samples. The landmark coordinates for each frame were stored in separate NumPy arrays with an .npy extension, organized into class-specific folders.

The data was structured as follows:

  • 90 samples (30 for each sign)

  • Each sample had 30 frames

  • Each frame had 1662 keypoints

We combined all the data into a single array with a shape of (90, 30, 1662) and labelled it as X. The corresponding classes ("Hello," "Thank You," "I Love You") were stored as y. We split the data into training and testing sets for model evaluation.

Model Architecture

Our deep learning model was designed to process sequences of landmarks effectively. Here's an overview of the model architecture:

LAYERS = [
    LSTM(64, return_sequences=True, activation="relu", input_shape=(30, 1662)),
    LSTM(128, return_sequences=True, activation="relu"),
    LSTM(64, return_sequences=False, activation="relu"),
    Dense(64, activation="relu"),
    Dense(32, activation="relu"),
    Dense(actions.shape[0], activation="softmax")
]

model = Sequential(LAYERS)

The model consists of multiple Long Short-Term Memory (LSTM) layers, which are well-suited for processing sequential data. We also included fully connected (Dense) layers for classification. We used the Adam optimizer, categorical cross-entropy loss function, and categorical accuracy as the evaluation metric.

Training the Model

We trained our deep learning model on the data for 205 epochs to ensure it learned the sign language gestures effectively. The model's performance was evaluated during training to monitor its accuracy and loss on the validation set.

Real-Time Detection

After training and saving the model, we implemented real-time sign language detection using OpenCV in a Python script. This script loads the pre-trained model and captures video frames from a camera feed. It then processes the landmarks in each frame and predicts the corresponding sign.

The model's predictions were displayed in real-time, allowing for effective sign language interpretation.

Conclusion

Our Sign Language Detection project demonstrates the power of computer vision and deep learning in improving accessibility and communication for individuals who use sign language. By capturing and interpreting sign language gestures in real-time, we've taken a significant step towards making technology more inclusive and accessible to everyone.

This project also highlights the importance of using cutting-edge technologies like deep learning to solve real-world problems and enhance the lives of individuals in our community.

Feel free to reach out if you have any questions or would like to learn more about this project!