Figure: A high-level architecture of the proposed hybrid fraud detection framework for invoicing platform.
Wahid, D. F., and Hassini, E. (2024). An augmented AI-based hybrid fraud detection framework for invoicing platforms. Applied Intelligence. 54(2). 1297-1310
In today’s e-commerce landscape, many companies are adopting subscription-based invoicing platforms to manage their electronic invoices. However, fraudsters are increasingly exploiting these platforms for various malicious activities. Identifying these fraudsters poses a significant challenge for many companies due to constraints on time and resources. While a fully automated fraud detection model can be helpful, it also carries the risk of falsely flagging legitimate transactions.
This project proposes a hybrid fraud detection framework designed for scenarios where only a small set of labelled (fraud/non-fraud) data is available, and human input is required in the final decision-making stage. The framework utilizes a combination of unsupervised and supervised machine learning, red-flag prioritization, and an augmented AI approach involving human input. It also introduces a weighted center based on the feature importance scores for the fraud risk cluster, used in the red-flag prioritization-based Human-in-the-Loop (HITL) process. Finally, the approach is demonstrated through a case study to identify fraudulent users in an invoicing platform.
This research project wasdone in collaboration with an industry partner under the Mitacs Accelerate Fellowship program. Our hybrid framework has demonstrated promising results in our partner's platform to identify effectively fraudulent users and enhance human decision-making when human input is integral to the final decision. Finally, we delivered this framework and (trained) model to our industry partner for production deployment.
Figure: A high-level architecture for user-generated short-text classification framework.
Wahid, D. F., and Hassini, E. (2023). User-generated short-text classification using cograph editing-based network clustering with an application in invoice categorization. Data & Knowledge Engineering. 148. 102238.
The rapid expansion of online business platforms across all industries generates a large amount of user-generated text data. This data includes product or service descriptions, reviews, marketing content, and financial information such as invoicing and bookkeeping. However, this data is often short, contains errors like misspellings and abbreviations, and needs more accurate categorization. It is essential for these platforms to accurately categorize this user-generated short-text data to understand their users’ needs.
This project presents a framework for classifying user-generated short-text data into appropriate categories. In the first phase, we used a method called cograph editing (CoE)-based clustering on a network of keywords extracted from the user-generated short-texts. We also developed integer linear programming (ILP) formulations for CoE on weighted networks and created a heuristic algorithm to identify clusters in large-scale networks.
We applied this framework to categorize invoices in a real-world setting, and our results showed promise in accurately identifying invoice line-item categories for large-scale data. Our final model was delivered for a production deployment.
Figure: Identified research communities in the common-knowledge network generated from knowledge commonality among the authors from McMaster University.
Wahid, D. F., Ezzeldin, M., Hassini, E., & El-Dakhakhni, W. W. (2022). Common-knowledge networks for university
strategic research planning. Decision Analytics Journal; 2, p. 100027.
We defined the concept of a common-knowledge network of authors in a research institution and used it to identify communities of authors using a new heuristic algorithm for clustering editing problems on weighted similarity measure networks. We analyzed dominant research topics based on the most frequent keywords, publications and collaboration incident counts for each identified research community. Our methodology can be used to create multidisciplinary research clusters in universities and support senior management in setting investment strategies for fostering large-scale, innovative, collaborative initiatives across different disciplines.
[talk] [dissertation] [code]
Wahid, D. F. (2017). Random Models and Heuristic Algorithms for Correlation Clustering Problems on Signed Social Networks. M.Sc. Thesis Dissertation. Department of Computer Science. The University of British Columbia.
The rise of online social networks has led to an increase in web-based signed networks where interactions are determined by factors like/dislike or trust/distrust. To address real-world complex network issues, it's crucial to be able to test new ideas on an artificial network with manageable structural properties. This approach allows researchers to simulate real-world network phenomena accurately.
The evolution of web-based signed networks has created a need for random models that can capture different aspects of these networks. In this project, we have examined signed-directed degree distributions in three real-world signed-directed social networks and proposed three random models for signed-directed social networks to capture...
Figure: Status balanced and imbalanced triangles. Triangles 1, 2, 3, 4, 8, 9, 10, and 12 are positive (i.e., status-balanced triangles), and triangles 5, 6, 7, and 11 are negative (i.e., status-imbalanced triangles).
The mutual attitude between members of a social group can be represented as a mixture of positive and negative interactions. Researchers have described this using signed, directed networks and defined balanced social group states based on the principle that “friend of my friend is my friend,” whereas “enemy of my enemy is my enemy ” and “enemy of my enemy is my friend.”.
Another approach considers social networks as status networks, where the mutual attitude between members is represented as a mutual status relation. This idea has been used to predict the evolution of social network links over time. In a static directed signed network, the status theory is used to define status balance and formulate a problem called status-correlation-clustering (SCC). An integer programming formulation for SCC is also presented.