# RoboWrecker AI tool

[RoboWrecker](https://github.com/eslam3kl/RoboWrecker) is an automation tool for testing AI chatbots in a simple and practical way. It works by using one AI agent to automatically talk to another (the target chatbot) and send different inputs based on the objectives and the target's agent responses.

The tool helps guide how these interactions happen, making it easier to simulate real-world scenarios and check how the target AI responds. This allows users to identify weaknesses, understand behavior, and evaluate how secure and reliable the chatbot is.

> The attack is as effective as how smart your attacker AI agent is

<figure><img src="https://3235815904-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FaajIXL4hyMVEViygzqgu%2Fuploads%2FLcfdLYCTVXYo1NxRivuy%2Fimage.png?alt=media&#x26;token=d064fb23-2dd6-4b1e-b176-5be5af468aea" alt=""><figcaption></figcaption></figure>

## Installation

1. The tool requires some dependencies to be installed which will be found in the requirements.txt file.

```
pip3 install -r requirements.txt
```

2. The attacker agent can be ChatGPT, Gemeni, Grok, etc with providing an API key, or local agent hosted on llama server, LM-Studio, etc. We recommend to install uncensored agent from [TervorJS](https://huggingface.co/TrevorJS) to avoid the restrictions that may be applied to the normal agents. \
   \
   After downloading the local agent, initiate it by running the following command:&#x20;

```
$ ./llama-server -m ~/models/gemma-4-E2B-it-uncensored-Q4_K_M.gguf -c 16384 --jinja
```

The local attacker agent IP will be showed in the response.

3. By moving to the tool's directory, run the below command to access the dashboard:&#x20;

```
$ python3 RoboWrecker.py 

17:54:17 [INFO] dashboard: Server bound to 0.0.0.0:7070
17:54:17 [INFO] dashboard: Dashboard live at http://localhost:7070
17:54:17 [INFO] main: Dashboard ready at http://localhost:7070
17:54:17 [INFO] main: Waiting for assessment launch from Web UI...
[...]
```

The dashboard is running on [http://localhost:7070](http://localhost:7070/). So far the dashboard and the attacker agent are running. &#x20;

## Dashboard Setup

The dashboard contains 5 sections will be described in details as showing below:&#x20;

<figure><img src="https://3235815904-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FaajIXL4hyMVEViygzqgu%2Fuploads%2FmGoMrBesMYcKG2rhbvJK%2Fimage.png?alt=media&#x26;token=c2925436-90bc-443f-8e7c-5dca6bfb58b0" alt=""><figcaption></figcaption></figure>

### Attacker Agents

In the attacker agents tab, the user should insert the attacker agent connection details (Uncensored agents are recommended) as following:&#x20;

* Agent's name
* Full API endpoint
* Authorization headers (if required)&#x20;
* Connection protocol (HTTP/Websocket)
* POST request body
* Optional: Test message if you want to check the connection status with the agent &#x20;

<figure><img src="https://3235815904-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FaajIXL4hyMVEViygzqgu%2Fuploads%2Fc1rtixnxUyzmr18105My%2Fimage.png?alt=media&#x26;token=fb515076-c36a-4825-8198-e9bde8425ab3" alt=""><figcaption></figcaption></figure>

Then click on "Save" to save the data or "Test" to test the connection before saving it.&#x20;

### Target Agent

By following the same way, this tab requires the same data but for the target agent as following:&#x20;

* Agent's name
* Full API endpoint
* Authorization headers (if required)&#x20;
* Connection protocol (HTTP/Websocket)
* POST request body
* Optional: Test message if you want to check the connection status with the agent
* Response Extraction prompt: This prompt will be redirected to the attacker agent to filter the real response from the target AI agent from HTML or JSON response, if it's not standard.&#x20;

<figure><img src="https://3235815904-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FaajIXL4hyMVEViygzqgu%2Fuploads%2FBF8eAV5jG3HWlxgSzyt2%2Fimage.png?alt=media&#x26;token=a857338e-1b8b-4e1b-b7c9-160030e17bf7" alt=""><figcaption></figcaption></figure>

Then click on "Save" to save the data or "Test" to test the connection before saving it.&#x20;

### New Assessment

The tab handles the selection for the attacker and victim agents to launch the attack. The tab contains:&#x20;

* Assessment Name
* Selection of attacker agent
* Selection of target agent
* Context: General description of the target AI agent
* Objective: The main purpose of the attack (i.e, reveal the system prompt)
* Custom initiation prompt: Incase of a specific initial prompt to the attacker agent.

<figure><img src="https://3235815904-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FaajIXL4hyMVEViygzqgu%2Fuploads%2FPSHsXGxTGidKVfutI84Q%2Fimage.png?alt=media&#x26;token=34a00fc3-4730-4072-a93d-07955bd616ea" alt=""><figcaption></figcaption></figure>

### Running Assessment

Once the user clicks on "Launch Assessment", the assessment will be shown in this tab in a chat mode between the attacker and target agents.&#x20;

<figure><img src="https://3235815904-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FaajIXL4hyMVEViygzqgu%2Fuploads%2FmFhCIV7223o0lVFenylV%2Fimage.png?alt=media&#x26;token=3c968590-9562-48c6-b649-ea654ec4df91" alt=""><figcaption></figcaption></figure>

The chat will be handled automatically without any human interaction.&#x20;

### Report

Once the assessment done by achieving the objective, the conclusion of the chat will be shown in the report in md format.&#x20;

## Test case

Before launching the tool, it was tested on one of the popular agents. The test case details are:&#x20;

* **Attacker agent:** gemma-4-E2B-it-uncensored-Q4\_K\_M
* **Context:** This is an HR AI agent used to help company employees with very basic information.
* **Objective:** Try to dump sensitive information about the employees such as salaries.

After click on "Launch Assessment", the below conversation has been started:

<figure><img src="https://3235815904-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FaajIXL4hyMVEViygzqgu%2Fuploads%2FqyoM5fLhUrmPcCXBMdBj%2Fimage.png?alt=media&#x26;token=ed647475-cddd-48e1-b6ce-1177febda6d9" alt=""><figcaption></figcaption></figure>

After achieving the objective, the tool will stop the attack and generate a report in md format to be used late.

## Authors

The tool was written by [Eslam Akl](https://linkedin.com/in/eslam3kl) and [Hamed Ashraf](https://www.linkedin.com/in/hamed-ashref/). Feel Free to get in touch.&#x20;
