Software Engineer - REVE Systems

February 2023 - Present
Dhaka, Bangladesh

  • Deployed a gender and age predictive model from speech, achieving 93% and 74% accuracy respectively.
  • Optimized a custom-trained VITS text-to-speech model for ONNX on Bengali datasets, doubling inference speed for rapid cross-platform deployment.
  • Developed a Your TTS-based multi-speaker model, enabling diverse voice synthesis with a single model.
  • Enhanced ASR by training NVIDIA's Nemo Quartznet on Bengali datasets with beam search, improving prediction accuracy.
  • Built a Speaker Diarization system using Nvidia TitaNet-L, identifying speakers in multi-party conversations with over 80% accuracy.
  • Implemented a 3x faster FastSpeech 2 pipeline integrating TorchServe, WebSocket, and Redis for efficient batch processing.
  • Conducted unsupervised clustering of noises using K-means, t-SNE, and silhouette scoring for future noise augmentation strategies.
  • Developed a tri-layered Biometric Authentication System with speaker, facial, and fingerprint verification, supported by ROC & AUC analysis for threshold setting.
  • Created a NISQA-based speech quality evaluation tool for quantitative assessment of speech signal integrity and clarity.
  • Developed a Keycloak SPI for integrating face and fingerprint authentication, enhancing security with biometric verification.

Technologies: Python, FastAPI, Scikitlearn, Tensorflow, Pytorch, Tensorboard, MLflow, Redis, Torchserve, Websocket, MongoDB, Linux, Lambdalabs, Java, Prompt Engineering

Junior Software Engineer - REVE Systems

September 2022 - January 2023
Dhaka, Bangladesh

  • Implemented audio analysis techniques for pitch detection, volume normalization, and energy calculation.
  • Doubled voice activity detection accuracy across use cases with sileroVAD from WebRTCVAD.
  • Enhanced audio quality using DeepfilterNet and Meta speech enhancement pretrained models for noise reduction.
  • Optimized audio data processing speed by a factor of 5 using Python's multiprocessing.
  • Configured Azure and Google Cloud STT/TTS APIs for comprehensive speech-to-text and text-to-speech evaluations.
  • Applied multispeaker speech separation technology using Conv-TasNet and DPRNN pretrained models.

Technologies: Python, FastAPI, Matplotlib, Seaborn, Numpy, Pandas, GCP

Trainee Software Engineer - REVE Systems

April 2022 - August 2022
Dhaka, Bangladesh

  • Engineered a Bengali Text Normalizer and trained a FastSpeech 2 TTS model to support Bengali language processing.
  • Gained proficiency in containerization with Docker and API development with Flask.
  • Performed audio data analysis and signal processing, focusing on data visualization for insights.

Technologies: JavaScript, Python, Docker, Flask, Postman, GitHub

Intern - SELISE rockin' software

May 2019 - June 2019
Dhaka, Bangladesh

  • Built an online quiz platform; front-end in HTML, CSS, JavaScript, Bootstrap; back-end in PHP, Ajax, MySQL.
  • Learned GIT, SCRUM, SDLC; facilitated admin controls for question curation.
  • Implemented registration for timed student tests and performance comparison.

Technologies: HTML, CSS, JavaScript, Bootstrap, PHP, Ajax, MySQL