source: https://x.com/goodalexander/status/1751676814890602921
A Product Document for Individual Traders (Trigger Warning: Long Post)
It makes me happy when people unplug from the system and go trade their own money. Trading has been my passion since 16 years of age. I find that most information on the subject is terrible and wanted to try and contribute a bit.
I think @therobotjames sells high quality courses on the topic. I'll take one once I blow up. Til then here's my free guide.
As a disclaimer, this is targeted at people who have worked in big tech for a while, are technical, with substantial savings. I don't advise doing this with a low bankroll, or without coding experience. This isn't financial advice either. I'm sure I missed a lot too.
Job Description
Create hypotheses. Deploy them in all relevant asset classes. Track their results. Understand, quantify and manage risk and improve operational execution. Discard bad hypotheses. Deploy capital to working hypotheses. Engineer a system to do so while protecting IP.
But most importantly. Make money every month.
Core Infrastructure
Entity Formation - you should decide how you do this early on.
- Offshore set up for crypto exchanges. Not going to get into this here. Be smart.
- Puerto Rico Act 60 (individual investors) - 0% capital gains tax is more or less essential to do short term trading strategies. If you do Staking, or airdrop work you will also need a corporate decree separate from an individual investor decree.
- Alternately (no Puerto Rico) can look into Trader Tax Status assuming you generate a sufficiently high volume. This allows you to deduct interest expenses, various tech expenses, hiring and other important things such as healthcare.
- Or if you want to be a fund, run managed accounts that’s a whole other thing and I’m not going to cover that here
- Just “whipping it” is a bad idea. Even as an individual, you want entities. The entities should pay for your hardware / any expenses below.
Market data Pulls - obviously depends on your strategy, but in my view we are in global connected markets and you should have a data stack that reflects this
- Bloomberg API (both intraday, BDP, and BDH data pulls) . The real reason you need Bloomgerg is for real time corporate bond data, as well as real time interest rate swap data. If we live in an over-indebted world run by central banks, then it sort of follows it’s the most important core data set. Bloomberg is expensive because it has a monopoly on rates and bond data, and -- sadly, you probably need that data to trade well.
- Interactive Brokers Python API - this is for execution, transaction cost analysis et.
- Tiingo - provides real time crypto, FX, and solid equity price info
- Sharadar - large historical US fundamental data, useful for providing aggregates)
- TD Ameritrade Options API - the most extensive real time queryable options data source. You need to generate volume for this to work
- Ishares daily CSV pulls - you need this for bond data as you’ll rapidly go over Bloomberg API limits otherwise
- Tardis crypto intraday data / perp funding rates
Brokerages - I prefer to have multiple brokerages as they excel at different things
- IBKR for time sensitive execution, international stocks and share lending
- Etrade for free transaction costs / medium frequency strategies
- TD for some options strategies (mostly execute there to ensure I have access to their options data)
- Coinbase + relevant offshore crypto exchanges. Keep mins on exchanges.
- Go through Portfolio Margin applications on all exchanges
Crypto Hardware - Likely separate from tradfi hardware.
- All crypto trading should be air gapped from other systems
- Recommend using a chromebook for interacting with UX-es
- Recommend having a different computer that touches crypto vs all your other machines
VPN, Brave Browser
- Never log into any website you commonly use (Amazon etc) on the crypto machinery as it will infect your computer with fingerprints that can be detected
Servers - Azure is nice bc it has OpenAI in one billing spot
- Azure Windows 11 Machine with a UX (can log in via RDP file). You need this for IBKR and Bloomberg for continuous connection
- Hetzner Linux Boxes - You need 3+ Linux servers
--- Crypto trading box to interact with anything offshore (cannot be based in US)
--- Production trading data / tcost/ performance / IP box
--- Nonproduction application Box optimized for scraping, any non-IP sensitive stuff
--- 1 box per contractor / employee. don't let them touch any of your stuff.
- All three+ boxes should have PostGres DBs for you to store various information. The Nonproduction box should be
Workspace set up - minimal local development, tunnels to machines for continuity.
- VS Code Tunnels enabled on all your servers
- Jupyter Labs set up on all servers. Jupyter notebooks inside VSCode crash for high memory jobs and are unreliable / periodically fail.
- Github obviously. I think it’s good to split into two repos: the IP sensitive repo and the less IP sensitive repo. This saves you headaches down the road. The IP sensitive repo has anything related to trading. The non-IP sensitive repo is stuff like market data cron jobs, alt data scrapers etc.
SEC/Catalyst Scrapers - If you own it you shouldn’t be surprised when something comes out
- You probably should build out tooling to alert you when your names are publishing 8ks at bare minimum as this is reflected faster than Bloomberg
- If you’re trading macro instruments you probably need tooling to detect economic data releases (Investing.com is the best free one afaik. But Bloomerg calendars are far better but a pain to pull)
Basic Alt Data - Why operate at a disadvantage vs basic normal funds?
- Google Trends - for Google Trends I encourage you to use both “News”, “Image” and “Search” and incorporate multiple geographies. This allows you to normalize for negative shocks (for example if everyone is searching for news on Seaworld it probably means something terrible happened, vs if they’re just searching for Seaworld it’s okay - thus the ratio
- Web Data (similarweb, other sources)
App download data
Twitter API / some other social feeds
BAMSEC / Tegus subscription for historical transcript data (this is cheapest and best I’ve found)
AI Tooling - Increasingly critical for both alpha generation as well as making sure you can develop at a fast enough pace
- Must have: openAI Tier 5 API (spend $1k+ so you arent throttled). Can also do this in Azure
- Copilot / VSCode
- OpenAI team subscription (to ensure your chats don’t end up in training data)
- Discord UX to interact with open source AI models, or internal fine tunes. Previously this was necessary to ensure your chats didn’t end up in training data (Discord has a very easy to use, free Python API)
- Replicate: I personally think Replicate is the best service for running opensource AI models. TogetherAI is okay, but I’ve found replicate works best. Can do giant async jobs too
- Nice to Have: Runpod architecture for running funky stuff like fresh HuggingFace models not yet available on Replicate
Strategy Generation
Hypothesis Notebook - Some place to write down descriptions of your strategies. This should NOT be on any public cloud server (or encrypted heavily if it is).
- When you come up with an idea to test you should put it somewhere with a date
- You should create an ontology for strategies -- i.e. a “Parent” strategy and then children strategies underneath it
- There should be some kind of ontology to this notebook including dates and (if multiple authors) ownership
- For tracking it is helpful to understand for example how all your strategies around trading earnings are performing not just a specific one so having a system of tags is good, and it’s good to do this at the hypothesis generation level
- Discarded hypothesis book / doc. It’s good to track all the things that didn’t work. You should have 3+ failed strategies for every 1 successful strategy.
All qualitative trades go in this notebook also
Backtesting Engine - A thing that simulates historical performance of your hypotheses.
The backtesting engine should live on a highly secure box
- Numpy based / vectorized calculations for running fast “entry exit” strategies and incorporating Tcosts. You ideally don’t want to use native python functions here.
- Survivor bias inclusive data. A quick and dirty hack is to have a universe entirely made up of bankrupt and delisted stocks (acquired, halted etc). Run your strategy on the delisted / BK universe (available via Sharadar). The crypto version is running a strategy only on things that rugged (bitconnect etc)
- Survivorship bias adjusted alt data. This is far more of a pain in the ass. For example, Tesla.com used to be Teslamotors.com so you need a way to handle the port over.
- For AI based backtests it’s good to implement some prompt engineering that tries to simulate “foreknowledge”. E.g “You only know information as of Jan 1, 1998 in a system prompt”. This is a pain and doesnt always work but is better than not doing it I’ve found. Can also stress test this for example asking it questions about stuff it shouldnt know
- Every backtest should aspire to a notion of expected value, so that you have a price which you’d be relatively happier to buy or sell something. For example, a backtest should generate a signal like +1 Z score. Which should translate to “I’m happy to buy this stock up 1% up vs its prior close (or vs its peers or whatever)”
- Non expected value backtests should have some notion of trend / assumptions around cutting risk built in (I lob in market order and put trailing stop below 1 STD)
- At the end of any backtest you should be able to answer "Is there statistically significant evidence that doing this thing is a good idea, such that if I'd done it a lot historically I'd have made a lot of money after tcosts and market impact?"
Strategy Execution
Signal Cron Job - Run your things every day.
- A Cron Job that runs as often as needed to ensure the data you’re using to trade is fresh
- Tool that outputs a warning if your data is not updated or a data source is throwing an error/ producing a value above a certain number of standard deviations above normal
- Login Reminders (where needed). Can build these into discord (i.e. please log into IBKR on the Azure box)
Execution Engine
- Align your real time execution with signals that are generated
- Run your PNL, slippage, and Tcosts on every brokerage. Aggregate results
- Run your beta in dollar amounts to the following assets, as relevant to the “session” you’re in. New York, Europe, Asia. Will want betas vs S&P,Treasuries, Oil, Dollar, Bitcoin, Gold, China, Small Cap (IWM) vs Tech (QQQ)
- Run your portfolio’s max drawdown in all major “sessions” (New York, Asia Europe). Display in a UX you can look at at any time. Max drawdown should be over the longest window you can muster. It's best to be honest. This also means doing annoying things like mapping assets to underlying (i.e. BOIL -> Nat Gas. Nat gas has had some really disgusting moves historically and a levered nat gas ETF can be a recipe for disaster). It's worth spending some serious time on this. Knowing your max realistic loss is very important to long term survival and is very useful to tracking results. A great KPI is "what % of my long-term max drawdown did I make today".
- Build a system to change security risk relative to options pricing. Use the TD API, pull the closest liquid straddle. Use to size trade when there is data. History is only one guide of risk and the options market does a good job pricing forward risk.
- Discord alerts for any BIG problem (overage in concentration in 1 security, risk limit breached, dollar PNL threshold breached, data update error)
- System to Flatten your risk via futures automatically. For example if you have $500k of Small Cap vs Tech lean, for each “session” you want a mechanic to cut that risk.
- A periodically updated Google Sheet or Excel sheet that contains all your risks, tickers, notionals etc that you can easily manually audit
Post-Trade Engine
- Analysis of operational failures / classifications. Some examples of flags
-- Data failed to update
-- Signal failed to update
-- Slippage was outside of acceptable bounds (+1.5 std)
-- PNL not tracking backtest for x period
-- Have a rolling measure of how your backtests are performing and a comparison of how they are performing vs your actual PNL
-- A Google sheets based KPI Sheet. Should contain
----PNL per session, daily
----Total Slippage
----Total commissions
----Dollar delta versus backtest
----Total simulated max drawdown
Hiring/ Team Management - I operate with no partners but do periodically hire ppl do do things I'm bad at.
- VSCode Liveshare is a must for working with remote teams as it allows working in the same VSCode tool
Never trust that any piece of code will “just work”. Understand it if it is in prod in any way shape or form. If someone makes unintelligible or hard to maintain code, warn once. Then remove from team second time
- Hire for specific tasks
- Do not indicate you are developing trading IP or obfuscate that from contractor
- Periodically churn team OR invest appropriately in legal docs / enforcement (hard)
- NEVER COLLABORATE ON TRADING IP WITHOUT FIRST CONSIDERING WHAT HAPPENS IF YOUR IP IS STOLEN
Closing Thoughts / Evaluating Career Risk Reward
Back in April I had a bit of a panic because I thought AI would replace my job trading. I still more or less think this, but quickly realized that I could be the one to replace my job and get an edge doing so.
If you believe in the analytical horsepower of AI (as I do), and you think the right bet is that AIs will largely replace and outperform humans rather than augmenting them - then trading could be a solid venue to express that.
AI tools also make the job of trading individually far more doable than before - because you can more quickly debug problems, and it's harder to get stuck.
The final question is of course, "If you can do all of this why don't you go work at a pod shop or run a fund", and the answer is that I think trading can fund the creation of a real business (as well as a crypto protocol) I wouldn't be able to start at a pod, that isn't really captured by the fund business model which has onerous regulatory constraints. I also generate revenue from activities related to my data collection I could not easily explain at a fund.
I also like owning my execution/tech stack, being able to trade whatever I want, being able to gather data myself without the confines of an organization, being able to Tweet and not having to answer to some guy who periodically wants to cut my risk because I've got a huge endless short Africa lean, and trade non-scalable crap.
If I were world class at trading megacap tech stocks as factor neutral pairs obviously that'd be a different equation. If you just want to trade, and do this for a while there's always the optionality to go do it at a fund or a prop firm though.
I'll close it out there. Hopefully this has got you thinking about this path, and if it's appealing to you.
Sadly I don't answer DMs much so please don't shoot me follow ups about this unless we're friends. good luck
See also: goodalexander page