Sia Gholami Website

Can pruning make Large Language Models more efficient?

Sia Gholami, Marwan Omar

Abstract: Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational efficiency, environmental impact, and deployability on resource-limited platforms. To address these challenges, this paper investigates the application of weight pruning, a strategic reduction of model parameters based on their significance, as an optimization strategy for Transformer architectures. Through extensive experimentation, we explore various pruning methodologies, highlighting their impact on model performance, size, and computational demands. Our findings suggest that with judicious selection of pruning hyperparameters, significant reductions in model size are attainable without considerable compromise on performance. Moreover, when coupled with post-pruning fine-tuning strategies, some pruned models even exhibit enhanced generalization capabilities. This work seeks to bridge the gap between model efficiency and performance, paving the way for more scalable and environmentally responsible deep learning applications.

Read Now on ArXiv

Can a Student Large Language Model Perform as Well as Its Teacher?

Sia Gholami, Marwan Omar

Abstract: The burgeoning complexity of contemporary deep learning models, while achieving unparalleled accuracy, has inadvertently introduced deployment challenges in resource-constrained environments. Knowledge distillation, a technique aiming to transfer knowledge from a high-capacity "teacher" model to a streamlined "student" model, emerges as a promising solution to this dilemma. This paper provides a comprehensive overview of the knowledge distillation paradigm, emphasizing its foundational principles such as the utility of soft labels and the significance of temperature scaling. Through meticulous examination, we elucidate the critical determinants of successful distillation, including the architecture of the student model, the caliber of the teacher, and the delicate balance of hyperparameters. While acknowledging its profound advantages, we also delve into the complexities and challenges inherent in the process. Our exploration underscores knowledge distillation’s potential as a pivotal technique in optimizing the trade-off between model performance and deployment efficiency.

Read Now on ArXiv

Does Synthetic Data Make Large Language Models More Efficient?

Sia Gholami, Marwan Omar

Abstract: Natural Language Processing (NLP) has undergone transformative changes with the advent of deep learning methodologies. One challenge persistently confronting researchers is the scarcity of high-quality, annotated datasets that drive these models. This paper explores the nuances of synthetic data generation in NLP, with a focal point on template-based question generation. By assessing its advantages, including data augmentation potential and the introduction of structured variety, we juxtapose these benefits against inherent limitations, such as the risk of overfitting and the constraints posed by pre-defined templates. Drawing from empirical evaluations, we demonstrate the impact of template-based synthetic data on the performance of modern transformer models. We conclude by emphasizing the delicate balance required between synthetic and real-world data, and the future trajectories of integrating synthetic data in model training pipelines. The findings aim to guide NLP practitioners in harnessing synthetic data’s potential, ensuring optimal model performance in diverse applications.

Read Now on ArXiv

Do Generative large language models need billions of parameters?

Sia Gholami, Marwan Omar

Abstract: This paper presents novel systems and methodologies for the development of efficient large language models (LLMs). It explores the trade-offs between model size, performance, and computational resources, with the aim of maximizing the efficiency of these AI systems. The research explores novel methods that allow different parts of the model to share parameters, reducing the total number of unique parameters required. This approach ensures that the model remains compact without sacrificing its ability to learn and represent complex language structures. This study provides valuable insights and tools for creating more efficient and effective LLMs, contributing to a more sustainable and accessible future for AI language modeling.

Read Now on ArXiv

Zero-shot virtual product placement in videos

Divya Bhargavi, Karan Sindwani, Sia Gholami

Abstract: Virtual Product Placement (VPP) is an advertising technique that digitally places branded objects into movie or TV show scenes. Despite being a billion-dollar industry, current ad rendering techniques are time-consuming, costly, and executed manually with the help of visual effects (VFX) artists. In this paper, we present a fully automated and generalized framework for placing 2D ads in any linear TV cooking show captured using a single-view camera with minimal camera movements. The framework detects empty spaces, understands the kitchen scene, handles occlusion, renders ambient lighting, and tracks ads. Our framework without requiring access to full video or production camera configuration reduces the time and cost associated with manual post-production ad rendering techniques, enabling brands to reach consumers seamlessly while preserving the continuity of their viewing experience.

Read Now on ACM

Jersey number detection using synthetic data in a low-data regime

Divya Bhargavi, Sia Gholami, Erika Pelaez Coyotl

Abstract: Player identification is an essential and complex task in sports video analysis. Different strategies have been devised over the years and identification based on jersey numbers is one of the most common approaches given its versatility and relative simplicity. However, automatic detection of jersey numbers is challenging due to changing camera angles, low video resolution, small object size in wide-range shots, and transient changes in the player's posture and movement. In this paper, we present a novel approach for jersey number identification in a small, highly imbalanced dataset from the Seattle Seahawks practice videos. We generate novel synthetic datasets of different complexities to mitigate the data imbalance and scarcity in the samples. To show the effectiveness of our synthetic data generation, we use a multi-step strategy that enforces attention to a particular region of interest (player's torso), to identify jersey numbers. The solution first identifies and crops players in a frame using a person detection model, then utilizes a human pose estimation model to localize jersey numbers in the detected players, obviating the need for annotating bounding boxes for number detection. We experimented with two sets of Convolutional Neural Networks (CNNs) with different learning objectives: multi-class for two-digit number identification and multi-label for digit-wise detection to compare performance. Our experiments indicate that our novel synthetic data generation method improves the accuracy of various CNN models by 9% overall, and 18% on low frequency numbers.

Read Now on Frontiers

Alexa, Predict My Flight Delay

Sia Gholami, Saba Khashe

Abstract: Airlines are critical today for carrying people and commodities on time. Any delay in the schedule of these planes can potentially disrupt the business and trade of thousands of employees at any given time. Therefore, precise flight delay prediction is beneficial for the aviation industry and passenger travel. Recent research has focused on using artificial intelligence algorithms to predict the possibility of flight delays. Earlier prediction algorithms were designed for a specific air route or airfield. Many present flight delay prediction algorithms rely on tiny samples and are challenging to understand, allowing almost no room for machine learning implementation. This research study develops a flight delay prediction system by analyzing data from domestic flights inside the United States of America. The proposed models learn about the factors that cause flight delays and cancellations and the link between departure and arrival delays.

Read Now on ArXiv

You don’t need labeled data for open-book question answering

Sia Gholami, Mehdi Noori

Abstract: Open-book question answering is a subset of question answering (QA) tasks where the system aims to find answers in a given set of documents (open-book) and common knowledge about a topic. This article proposes a solution for answering natural language questions from a corpus of Amazon Web Services (AWS) technical documents with no domain-specific labeled data (zero-shot). These questions have a yes–no–none answer and a text answer which can be short (a few words) or long (a few sentences). We present a two-step, retriever–extractor architecture in which a retriever finds the right documents and an extractor finds the answers in the retrieved documents. To test our solution, we are introducing a new dataset for open-book QA based on real customer questions on AWS technical documentation. In this paper, we conducted experiments on several information retrieval systems and extractive language models, attempting to find the yes–no–none answers and text answers in the same pass. Our custom-built extractor model is created from a pretrained language model and fine-tuned on the the Stanford Question Answering Dataset—SQuAD and Natural Questions datasets. We were able to achieve 42% F1 and 39% exact match score (EM) end-to-end with no domain-specific training.

Read Now on MDPI