Automating Dark Web CTI Reports ​ with RAG Insight for MISP Sharing

In the current digital landscape, organizations often do not become aware immediately when their data is compromised and sold online. Our objective is to minimize the duration between the exposure of data on the internet and its detection by the public. The dark web serves as a primary marketplace for the trade of personal information, accessible safely only through the use of the Tor browser. This paper focuses on monitoring significant trading forums on the dark web and demonstrates the method of web scraping specifically designed for dark web sites. Utilizing data harvested from these sites, we have trained a BERT classification model to categorize transaction posts into five distinct types of data leaks, enabling rapid identification of the leak type associated with each post.

Further, we employ the Retrieval-Augmented Generation (RAG) technique to vectorize dark web data, maintaining privacy while leveraging mainstream large language models to address concerns pertinent to cybersecurity analysts. This approach allows researchers to analyze dark web data effectively. Ultimately, the data collected from the dark web is formatted into STIX (Structured Threat Information Expression) and integrated into the MISP (Malware Information Sharing Platform) system to automate the generation of Cyber Threat Intelligence (CTI) reports. This methodology not only enhances the timeliness and accuracy of threat detection but also contributes to more efficient and proactive cybersecurity management.

會議:hack.lu 2024

講者:Yuki

時間:2024/10/22 (二) 14:30–15:00 (GMT+2)

連結:https://archive.hack.lu/hack-lu-2024/talk/AEV77X

You may also like