# OSWE Code Review Cheat Sheet

##

> Purpose: A compact, exam-focused cheat sheet for source-code review during OSWE-style assessments.\
> Includes: review mindset, fast grep commands, dangerous sinks per language, common bypasses, and a checklist.\
> **Updated:** added SSRF, XXE, Prototype Pollution, Eval Filter Bypass, CORS/CSRF chaining guidance.

***

### Quick mindset (always)

1. **Input â†’ Flow â†’ Sink** â€” identify *sources*, follow transformations, find *sinks* that execute/interpret/use data.
2. **Assume attacker control** â€” any user-controllable value reaching a sink without clear, correct validation/whitelist is risky.
3. **Prioritize:** dynamic evaluation, file includes, SQL/NoSQL, template rendering, deserialization, OS/DB commands, path handling, SSRF/SSO flows, and auth logic.
4. Look for **non-standard sanitization** (custom regexes, blacklists) and **unsafe use of framework helpers**.

***

### Fast discovery (grep/snippets)

```bash
# General dangerous keywords
grep -RIn --line-number -E "eval|exec|system\(|popen|subprocess|shell_exec|passthru|include\(|require\(|execFile|Runtime.getRuntime|ProcessBuilder|fetch\(|requests\.get|HttpClient" .

# SQL usage (simple)
grep -RIn --line-number -E "SELECT|INSERT|UPDATE|DELETE|EXECUTE|query\(" .

# Template & serialization
grep -RIn --line-number -E "render_template|render\(|template\.|unserialize|pickle\.loads|yaml\.load|JSON\.parse" .

# File/Path, Uploads
grep -RIn --line-number -E "open\(|fopen\(|move_uploaded_file|save\(|File\.write|file_put_contents|fwrite" .

# SSRF / URL fetches
grep -RIn --line-number -E "requests\.get|requests\.post|HttpClient|fetch\(|curl_exec|file_get_contents\(|urllib\.request" .

# CORS / CSRF
grep -RIn --line-number -E "Access-Control-Allow-Origin|CORS|Cross-Origin|csrf|XSRF|SameSite|session\.cookie" .
```

***

### Top dangerous sinks & why

* `eval` / dynamic code evaluation â†’ arbitrary code execution (RCE / SSTI).
* Template rendering with user input â†’ server-side template injection (SSTI).
* Direct SQL construction (string concatenation) â†’ SQL injection.
* Deserialization (`pickle`, `unserialize`) â†’ arbitrary object execution.
* OS command invocation with user input â†’ command injection.
* File include/require with user input â†’ local/remote file inclusion (LFI/RFI).
* XML parsing without secure config â†’ XXE.
* Returning raw user content without proper escaping â†’ XSS.
* Unsafely handling paths â†’ path traversal.
* **Server-Side Request Forgery (SSRF)** â†’ attacker-induced outbound requests (internal host probing, metadata access).
* **Prototype Pollution** â†’ attacker-controlled object properties altering application logic (Node/JS).
* **CORS/CSRF chaining** â†’ misconfigurations or chained flaws enabling cross-origin or cross-site attacks.

***

### Additional Vulnerabilities (expanded)

#### Server-Side Request Forgery (SSRF)

**What it is:** SSRF occurs when an application fetches remote resources (URLs) based on user input and the attacker can control the destination. This can let attackers reach internal services, cloud metadata endpoints, port scans, or cause unexpected behaviors.

**Common sinks:**

* HTTP clients: `requests.get()`, `fetch()`, `HttpClient`, `curl_exec()`, `file_get_contents(url)` and similar.
* Image/file retrieval endpoints that accept a URL to fetch.
* URL previewers, webhook handlers, RSS readers, remote health-checkers.

**Checks during code review:**

* Identify where the app performs outbound requests using user-provided URLs.
* Look for lack of hostname/IP validation, lack of allowed-scheme checks (e.g., `http`, `https` only), and absence of network-level restrictions.
* Does code allow redirects? Follow redirects may land on internal hosts.
* Is DNS resolution performed on user input without validation (e.g., resolving `localhost`/`127.0.0.1`)?

**Mitigations / safe alternatives:**

* Implement a strict **allowlist** of target hosts or domains.
* Normalize and resolve the host, then verify the resolved IP is not in private ranges (10/8, 172.16/12, 192.168/16, 169.254/16, ::1, fc00::/7).
* Restrict allowed schemes (only https when appropriate), and avoid allowing `file://`, `gopher://`, `ftp://`, `dict://`, `file:` etc.
* Use an isolated proxy with outbound restrictions for user-supplied fetches.
* Limit redirects and enforce a maximum redirect count; validate the final host after redirects.

**Greps / signposts:**

```bash
grep -RIn --line-number -E "requests\.get|fetch\(|curl_exec|file_get_contents|urllib\.request|HttpClient" .
```

**Example (lab safe):**

* Look for code that takes `?url=` and passes it straight to an HTTP client. Flag it and suggest allowlisting + IP resolution checks.

***

#### XML External Entity (XXE)

**What it is:** XXE lets an attacker cause the XML parser to load external resources or local files (via ENTITY declarations). It can expose local files, cause SSRF, or memory exhaustion.

**Common sinks:**

* `DocumentBuilderFactory` / `SAXParser` in Java without secure features.
* `xml.etree.ElementTree.fromstring()` in Python when using older libs or unsafe parsers.
* `libxml2`-backed parsers in PHP (`simplexml_load_string`) without disabling external entities.

**Checks during review:**

* Does the code parse XML from user-controlled sources? (uploads, endpoints, SOAP services)
* Are parser features to disable DOCTYPE and external entities set? (e.g., `disallow-doctype-decl`, disable external general entities).
* Is the application relying on third-party libraries that parse XML internally (uploads, feeds)?

**Mitigations / safe alternatives:**

* Disable DOCTYPE and external entity resolution for XML parsers.
* Use simple, non-XML formats when possible (JSON).
* Use secure XML parser configurations (documented for each platform).
* If external entities are required, fetch resources from a controlled/isolated environment.

**Quick grep:**

```bash
grep -RIn --line-number -E "DocumentBuilderFactory|SAXParser|xml\.etree|simplexml_load_string|libxml" .
```

***

#### Prototype Pollution (Node.js / JS)

**What it is:** Prototype pollution happens when attackers can modify `Object.prototype` (or other prototypes), introducing or changing properties that affect application logic â€” often via unsafe merges/assignments of user-supplied objects.

**Common sinks / patterns:**

* Merging user objects into configuration or templates using `_.merge`, `Object.assign`, deep recursive merges without validation.
* Accepting arbitrary JSON and passing it to code that treats properties as configuration flags (`req.body` merged into `options`).
* Using libraries with known prototype-pollution vulnerabilities.

**Checks during review:**

* Where user JSON is merged into internal objects/config (`options = {...defaults, ...req.body}`).
* Look for usage of `lodash.merge`, `merge`, `deepmerge`, or similar functions on user-controlled objects.
* Check any code that uses property names dynamically (e.g., `obj[foo]`) where `foo` can be `__proto__` or `constructor.prototype`.
* Validate whether the code sanitizes keys like `__proto__`, `constructor`, `prototype`, or symbol properties.

**Mitigations / safe alternatives:**

* Reject or sanitize keys that are `__proto__`, `constructor`, `prototype`, or those containing `__proto__` segments.
* Use shallow copies instead of deep merges when merging user input into internal config.
* Use libraries with built-in protections or pass a whitelist of allowed keys only.
* Deep-clone using safe utilities that explicitly block prototype chain changes.

**Quick grep:**

```bash
grep -RIn --line-number -E "Object\.assign|_.merge|merge\(|deepmerge|JSON\.parse" .
```

***

#### Eval-filter bypass (filters meant to stop eval)

**What it is:** Developers sometimes attempt to block `eval()` by filtering keywords or characters. Determined attackers can bypass naive filters by using alternative code paths or language features that achieve the same effect (e.g., `Function` constructor in JS, `assert()` in PHP, or indirect eval via template engines).

**Common risky patterns:**

* Naive string filters that remove the substring `eval` but not equivalent functionality (`new Function()`, `setTimeout(str, 0)`, `exec` variants).
* Relying on blacklists (blocklist) rather than whitelists, which tend to be incomplete.

**Checks during review:**

* Search for filter functions or regex-based sanitizers that operate on source strings (e.g., `str.replace(/eval/gi, '')`).
* Look for other dynamic-eval-capable APIs: JS `Function`, `setTimeout`, `setInterval` (string args), `vm` module; Python `exec`, `eval`, `compile`; PHP `assert()`; Ruby `eval`/`instance_eval`.
* Check if user input ever reaches those APIs indirectly (templating, code generation, REPL endpoints, admin consoles).

**Mitigations / safer alternatives:**

* Avoid dynamic evaluation entirely; refactor logic to use explicit paths or interpreters with a safe subset.
* Use whitelists for allowed operations or tokens and parse/interpret them with safe parsers.
* For unavoidable dynamic behavior, sandbox the execution environment (restricted runtime, containerization, strict timeouts, resource caps).

**Quick grep:**

```bash
grep -RIn --line-number -E "eval\(|new Function|setTimeout\(|vm\.runInNewContext|assert\(" .
```

***

#### CORS / CSRF Chaining (misconfig + chained logic)

**What it is:** CORS misconfigurations (e.g., `Access-Control-Allow-Origin: *` or echoing back `Origin` without validation) can be combined with other issues (exposed credentials, lax auth logic) to allow cross-origin attacks. CSRF chaining is when a CSRF (state-changing) request is combined with other server behaviors to escalate impact.

**CORS checks during review:**

* Search for code that sets `Access-Control-Allow-Origin` dynamically based on `Origin` header; check whether it validates against an allowlist.
* Examine `Access-Control-Allow-Credentials: true` together with wildcard origins â€” that is insecure (browsers block, but some code echoes origin incorrectly).
* Check `Access-Control-Expose-Headers` and `Vary` headers handling (caching pitfalls).

**CSRF chaining checks:**

* Look for state-changing endpoints (POST/PUT/DELETE) that lack CSRF protections (tokens, SameSite cookies).
* Check whether application relies solely on `Origin`/`Referer` for CSRF protection â€” may be insufficient if chained with CORS misconfig.
* Identify flows where a low-privileged action can be chained to perform sensitive action (e.g., an endpoint that sets an email address + another that triggers password reset to that email).

**Mitigations / safe alternatives:**

* For CORS: use strict allowlists, avoid echoing `Origin` without validation, and avoid `Access-Control-Allow-Credentials: true` with wildcard origins.
* For CSRF: enforce anti-CSRF tokens or SameSite=strict/lax cookies for session cookies, validate origin for critical operations in addition to tokens.
* During review, flag any place where cross-origin requests can result in state changes without proper CSRF tokens or where credentials are exposed by CORS.

**Quick grep:**

```bash
grep -RIn --line-number -E "Access-Control-Allow-Origin|Access-Control-Allow-Credentials|CORS|csrf|SameSite|Origin" .
```

***

### Language & Framework Cheat Lists (abridged)

> For each: **Pattern â†’ Risk â†’ Safer alternative**. (Sections for major languages retained from main sheet.)

... (Python, PHP, Node, Java, C#, Ruby sections omitted here for brevity; they are unchanged from the main sheet content and still included in the full markdown file.)

***

### Common input sources to monitor

Query params, POST bodies, headers, cookies, multipart filenames, JSON payloads, WebSocket messages, path params, template names, dynamic import names, metadata fields (Content-Disposition), and any DB fields originally supplied by users.

***

### Typical bypass / evasion techniques

* Encoding: `%27`, `%2f`, `\x27`, `0x27`.
* `CHR()`/`CHAR()` in SQL to reconstruct quotes.
* Concatenation: `CONCAT()`, `+` to build strings.
* Comments: `--`, `/* ... */` to truncate rest of a query.
* Time-based / Blindatives: `SLEEP(5)` for blind SQLi.
* WAF evasion: Unicode, homoglyphs, mixed encodings, comment insertion.
* For eval filters: alternative APIs (`Function` vs `eval`), indirect execution via templating or callbacks.
* For SSRF/XXE: nested redirects, DNS rebinding, use of intermediary services (e.g., URL shorteners) â€” flag any URL resolution chain.

***

### Quick manual testing payloads (educational)

> Use only in labs or authorized tests.

* SQL classic: `' OR '1'='1`
* Time-based blind: `' OR IF(SUBSTR((SELECT password FROM users LIMIT 1),1,1)='a', SLEEP(5), 0) --`
* SSTI (Jinja2 basic): `{{7*7}}` (look for `49`)
* LFI attempts: `../../../../etc/passwd`
* XXE (example, labs only): `<!DOCTYPE root [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><data>&xxe;</data>`
* SSRF marker: provide a controllable URL (in labs) and check app's outbound fetch behavior.
* Prototype pollution (test): sending `{"__proto__": {"polluted": true}}` to endpoints merging body into objects (labs only).

***

### OSWE exam checklist (compact)

* [ ] Find all input sources (params, headers, JSON, files).
* [ ] Grep for dangerous keywords & sinks.
* [ ] Trace taint flow from source â†’ transformations â†’ sink.
* [ ] Check for proper framework escaping/encoding.
* [ ] Verify prepared statements or ORM safe usage.
* [ ] Inspect deserialization usage and class whitelists.
* [ ] Inspect template rendering and any use of `render_template_string` / inline templates.
* [ ] Check file operations and path canonicalization.
* [ ] Check XML parsers for XXE mitigations.
* [ ] Search for dynamic imports/reflection/class loading.
* [ ] Review upload handling (filename, location, MIME checks).
* [ ] Note suspicious custom sanitizers (regex blacklists) and attempt bypasses.
* [ ] Review any code performing outbound requests (SSRF risk).
* [ ] Check for object merges / deep merges in JS (prototype pollution).
* [ ] Check CORS & CSRF protections on state-changing endpoints.
* [ ] Record every flagged finding with the exact file:line, a brief description, and a suggested fix.

***

### Suggested personal repo layout

```
oswe-cheatsheet/
  README.md
  CHECKLIST.md
  payloads/
    sql/
    ssti/
    lfi/
    ssrf/
    xxe/
    prototype_pollution/
  grep-snippets.md
  language-guides/
    python.md
    php.md
    node.md
```

***

### References (must-read)

* OWASP Web Security Testing Guide
* PayloadsAllTheThings (GitHub)
* PortSwigger Web Security Academy
* HackTricks (Book)
* Semgrep rules library

***

#### Notes & ethics

This sheet is for learning, authorized testing, and exam prep only. Never run offensive tests against systems you do not own or have explicit permission to assess.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://notes.awfulsecurity.org/oswe/oswe-code-review-cheat-sheet.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
