Publications | Raian Rahman

2024

Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs

Mohammed Saidul Islam, Raian Rahman, Ahmed Masry, Md Tahmid Rahman Laskar, Mir Tafseer Nayeem, and Enamul Hoque

arXiv preprint arXiv:2406.00257, 2024

Accepted in the Proceedings of EMNLP 2024 (Findings)

Abs Bib

Natural language is a powerful complementary modality of communication for data visualizations, such as bar and line charts. To facilitate chart-based reasoning using natural language, various downstream tasks have been introduced recently such as chart question answering, chart summarization, and fact-checking with charts. These tasks pose a unique challenge, demanding both vision-language reasoning and a nuanced understanding of chart data tables, visual encodings, and natural language prompts. Despite the recent success of Large Language Models (LLMs) across diverse NLP tasks, their abilities and limitations in the realm of data visualization remain under-explored, possibly due to their lack of multi-modal capabilities. To bridge the gap, this paper presents the first comprehensive evaluation of the recently developed large vision language models (LVLMs) for chart understanding and reasoning tasks. Our evaluation includes a comprehensive assessment of LVLMs, including GPT-4V and Gemini, across four major chart reasoning tasks. Furthermore, we perform a qualitative evaluation of LVLMs’ performance on a diverse range of charts, aiming to provide a thorough analysis of their strengths and weaknesses. Our findings reveal that LVLMs demonstrate impressive abilities in generating fluent texts covering high-level data insights while also encountering common problems like hallucinations, factual errors, and data bias. We highlight the key strengths and limitations of chart comprehension tasks, offering insights for future research.
@article{islam2024large, title = {Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs}, author = {Islam, Mohammed Saidul and Rahman, Raian and Masry, Ahmed and Laskar, Md Tahmid Rahman and Nayeem, Mir Tafseer and Hoque, Enamul}, journal = {arXiv preprint arXiv:2406.00257}, year = {2024}, note = {Accepted in the Proceedings of EMNLP 2024 (Findings)}, }

2023

ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short Summaries

Raian Rahman, Rizvi Hasan, Abdullah Al Farhad, Md. Tahmid Rahman Laskar, Md. Hamjajul Ashmafee, and Abu Raihan Mostofa Kamal

Proceedings of the Canadian Conference on Artificial Intelligence, Jun 2023

https://caiac.pubpub.org/pub/ujhjycsw

Abs Bib

Automatic chart to text summarization is an effective tool for the visually impaired people along with providing precise insights of tabular data in natural language to the user. A large and well-structured dataset is always a key part for data driven models. In this paper, we propose ChartSumm: a large-scale benchmark dataset consisting of a total of 84,363 charts along with their metadata and descriptions covering a wide range of topics and chart types to generate short and long summaries. Extensive experiments with strong baseline models show that even though these models generate fluent and informative summaries by achieving decent scores in various automatic evaluation metrics, they often face issues like suffering from hallucination, missing out important data points, in addition to incorrect explanation of complex trends in the charts. We also investigated the potential of expanding ChartSumm to other languages using automated translation tools. These make our dataset a challenging benchmark for future research.
@article{Rahman2023ChartSumm, author = {Rahman, Raian and Hasan, Rizvi and Farhad, Abdullah Al and Laskar, Md. Tahmid Rahman and Ashmafee, Md. Hamjajul and Kamal, Abu Raihan Mostofa}, journal = {Proceedings of the Canadian Conference on Artificial Intelligence}, year = {2023}, month = jun, note = {https://caiac.pubpub.org/pub/ujhjycsw}, publisher = {Canadian Artificial Intelligence Association (CAIAC)}, title = {ChartSumm: A {Comprehensive} {Benchmark} for {Automatic} {Chart} {Summarization} of {Long} and {Short} {Summaries}}, }

2022

Densely-Populated Traffic Detection Using YOLOv5 and Non-maximum Suppression Ensembling

Raian Rahman, Zadid Bin Azad, and Md. Bakhtiar Hasan

In Proceedings of the International Conference on Big Data, IoT, and Machine Learning, Jun 2022

Abs Bib

Vehicular object detection is the heart of any intelligent traffic system. It is essential for urban traffic management. Recent state-of-the-art methods apply R-CNN, Fast R-CNN, Faster R-CNN, and YOLO for this task. However, region-based CNN methods have the problem of higher inference time which makes them unrealistic to use the model in real-time. YOLO on the other hand struggles to detect small objects that appear in groups. In this paper, we propose a method that can locate and classify vehicular objects from a given densely crowded image using YOLOv5. We apply non-maximum suppression ensembling of 4 different models of YOLOv5 trained on different setups. The performance of our proposed model was measured on the Dhaka AI dataset which contains densely crowded vehicular images taken from both top view and side view of the street in both day and night settings. Our experiments show that our model achieved mAP@0.5 of 0.458 with an inference time of 0.75s outperforming other state-of-the-art models on performance. Hence, the model can be implemented in the street for real-world traffic detection which can be used for traffic control and data collection.
@inproceedings{10.1007/978-981-16-6636-0_43, author = {Rahman, Raian and Bin Azad, Zadid and Bakhtiar Hasan, Md.}, editor = {Arefin, Mohammad Shamsul and Kaiser, M. Shamim and Bandyopadhyay, Anirban and Ahad, Md. Atiqur Rahman and Ray, Kanad}, title = {Densely-Populated Traffic Detection Using YOLOv5 and Non-maximum Suppression Ensembling}, booktitle = {Proceedings of the International Conference on Big Data, IoT, and Machine Learning}, year = {2022}, publisher = {Springer Singapore}, address = {Singapore}, pages = {567--578}, isbn = {978-981-16-6636-0}, }