> Python deserialization is the process of reconstructing Python objects from serialized data, commonly done using formats like JSON, pickle, or YAML. The pickle module is a frequently used tool for this in Python, as it can serialize and deserialize complex Python objects, including custom classes.
* [j0lt-github/python-deserialization-attack-payload-generator](https://github.com/j0lt-github/python-deserialization-attack-payload-generator) - Serialized payload for deserialization RCE attack on python driven applications where pickle,PyYAML, ruamel.yaml or jsonpickle module is used for deserialization of serialized data.
The vulnerability is introduced when a token is loaded from an user input.
```python
new_token = raw_input("New Auth Token : ")
token = cPickle.loads(b64decode(new_token))
print "Welcome {}".format(token.username)
```
Python 2.7 documentation clearly states Pickle should never be used with untrusted sources. Let's create a malicious data that will execute arbitrary code on the server.
> The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
YAML deserialization is the process of converting YAML-formatted data back into objects in programming languages like Python, Ruby, or Java. YAML (YAML Ain't Markup Language) is popular for configuration files and data serialization because it is human-readable and supports complex data structures.
Since PyYaml version 6.0, the default loader for `load` has been switched to SafeLoader mitigating the risks against Remote Code Execution. [PR #420 - Fix](https://github.com/yaml/pyyaml/issues/420)
To avoid using `unsafe_load`, always use `safe_load` when working with untrusted YAML data.
```py
import yaml
with open('safe_data.yml') as file:
data = yaml.safe_load(file)
```
## Common Pitfalls
1.**Using `pickle` with untrusted data**: The `pickle` module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
2.**Using `yaml.load` without specifying a safe loader**: Always use `yaml.safe_load` when working with untrusted YAML data to avoid remote code execution vulnerabilities.
3.**Ignoring security warnings**: Always pay attention to security warnings and best practices when working with serialization and deserialization in Python.
## Testing for Insecure Deserialization
1.**Manual Testing**:
- Review the codebase for the use of insecure deserialization functions such as `pickle.loads`, `yaml.load`, and `jsonpickle.decode`.
- Identify the sources of input data and ensure they are properly validated and sanitized before deserialization.
2.**Automated Testing**:
- Use static analysis tools like [Bandit](https://github.com/PyCQA/bandit) to scan the codebase for insecure deserialization functions and patterns.
- Implement unit tests to verify that deserialization functions are not used with untrusted data and that proper input validation is in place.