Google DeepMind has taken a significant step in advancing biological research by releasing the full source code for AlphaFold 3.
AlphaFold 3 is the latest version of a groundbreaking artificial intelligence system developed by DeepMind, designed to predict the 3D structures of proteins based on their amino acid sequences. It builds on the success of AlphaFold 2, which revolutionized the field of structural biology by solving a decades-old problem and enabling researchers to predict protein structures with remarkable accuracy.
The 2024 Nobel Prize in Chemistry was awarded to Demis Hassabis and John Jumper for their contributions to AlphaFold, recognizing its transformative impact on scientific research, particularly in drug discovery and understanding diseases.
Six months after its initial debut in May, which sparked debates over transparency and accessibility, the tool is now available for non-commercial use, offering academics the opportunity to perform in-depth analyses and local installations. The release follows widespread calls from the scientific community for full access to the code and model weights to foster peer-reviewed studies and reproducibility.
Initial Release and Backlash
AlphaFold 3 was first introduced in May through a publication in Nature that highlighted its potential for predicting interactions involving proteins, DNA, RNA, and biologically significant ligands and ions.
Unlike AlphaFold 2, which was praised for its ability to predict individual protein structures, AlphaFold 3 is designed to model more complex interactions fundamental to drug discovery and disease research. However, initial access was limited to a web server that restricted predictions to 20 daily submissions, a constraint that hindered larger-scale studies.
The lack of full code access drew criticism from researchers, leading to an open letter signed by hundreds advocating for a complete release to ensure reproducibility and transparency. Magdalena Skipper, Editor-in-Chief of Nature, defended the initial decision by citing potential biosecurity and ethical considerations.
DeepMind responded to these concerns by promising a release within six months, culminating in the availability of the code on GitHub, though the Alphafold 3 model weights remain gated behind an application process.
Technical Advancements and Capabilities
AlphaFold 3 is a marked improvement over its predecessors, allowing for predictions of interactions involving multiple types of biological molecules. The software supports not only proteins but also nucleic acids and various biologically relevant ligands such as ATP, heme, and myristic acid, along with common ions like Ca²⁺ and Zn²⁺.
This capability expands its utility in molecular biology and drug development, offering a comprehensive view of interactions essential for understanding complex cellular processes.
The model provides outputs with confidence metrics such as pLDDT (per-atom prediction confidence) and PAE (Predicted Aligned Error), which assess the reliability of the spatial arrangement of molecules. These metrics, explained in detail in DeepMind’s documentations (input, output, performance), help researchers gauge the accuracy of the structures produced .
Installation and Usage Guide
Researchers aiming to harness the full power of AlphaFold 3 can follow detailed installation steps outlined on its GitHub page:
- Set Up Dependencies: Install necessary tools like Docker, Python, and CUDA for GPU acceleration.
- Prepare Input Files: Create a JSON input file that specifies the sequence of proteins or nucleic acids. Use standard single-letter codes for all inputs, with specific configurations for ligands and PTMs.
- Run the Prediction:
docker run -it \ --volume $HOME/af_input:/root/af_input \ --volume $HOME/af_output:/root/af_output \ --volume <MODEL_PARAMETERS_DIR>:/root/models \ --volume <DATABASES_DIR>:/root/public_databases \ --gpus all \ alphafold3 \ python run_alphafold.py \ --json_path=/root/af_input/alphafold_input.json \ --model_dir=/root/models \ --output_dir=/root/af_output
The outputs include detailed JSON files with summary metrics, confidence scores, and structural data segmented by entire complexes, individual chains, and chain-pair interactions. Users can access specific confidence scores to analyze predictions, including the chain_pair_pae_min for assessing potential interactions.
In AlphaFold 3, chain_pair_pae_min refers to a specific confidence metric used to evaluate the predicted interactions between different chains in a protein complex. It is represented as a [num_chains, num_chains] array, where each element (i, j) corresponds to the minimum Predicted Aligned Error (PAE) between chains i and j.
The value indicates the lowest PAE across all pairs of residues between the two chains, effectively capturing the most confident prediction for their interaction.
This metric is useful because it helps assess whether two chains are likely to interact based on the model’s confidence.
A lower chain_pair_pae_min value suggests a higher confidence in the interaction between the chains, while higher values indicate more uncertainty or a weaker likelihood of interaction. In some cases, this score can be used to distinguish binders from non-binders in protein complexes.
For example, in structural biology studies, researchers may use chain_pair_pae_min to rank and validate potential protein-protein interactions or to infer whether two chains in a complex are likely to form a stable interface.
Key Limitations and Known Issues
Despite its extensive capabilities, AlphaFold 3 has limitations. The model is constrained by a job token limit of 5,000, where tokens are counted per amino acid residue, nucleotide, and atom in ligands and PTMs.
Additionally, it occasionally produces overlapping atoms in large protein-nucleic acid complexes and may display spurious structures in disordered regions. The pLDDT scores, which indicate low-confidence regions, can be used to identify areas that may need further experimental validation.
Access Policies and Licensing Considerations
While the code is now accessible for non-commercial use, researchers must apply to obtain the model weights. DeepMind aims to respond to applications within 2–3 business days. This policy is part of a broader effort to maintain a balance between open academic research and commercial interests, especially as DeepMind’s spinoff, Isomorphic Labs, continues to explore drug discovery applications.
Researchers using AlphaFold 3 are required to cite the original publication when disclosing findings. This ensures acknowledgment of the tool’s foundational work and aligns with academic practices.
Competition and Emerging Models
The six-month delay in releasing AlphaFold 3’s code gave rise to alternative implementations. Companies such as Baidu and ByteDance, as well as smaller startups like Ligo Biosciences, developed their own versions using pseudocode.
These models showed that AlphaFold 3’s methodology was reproducible, even without direct access to the original code. The open-source release now allows comparisons and improvements, fostering a collaborative research environment.
Some researchers point to the challenges of non-commercial licenses for translating academic findings into commercial therapies. However, ongoing projects like OpenFold aim to provide fully open-source versions that support broader research and commercial use without licensing barriers.
Future Implications and Research Applications
The open-source release of AlphaFold 3 is set to accelerate progress in computational biology.
The authors of a paper published in Nature Computational Science just yesterday aim to incorporate AlphaFold3 into their software MassiveFold. MassiveFold enables the use of parallel computing to significantly decrease the time required for running numerous predictions in AlphaFold2, potentially reducing it from months to mere hours.
These integrations can cut down prediction times and expand the use of AI in understanding protein-ligand interactions, designing new proteins, and studying complex biological phenomena.
John Jumper, who leads the AlphaFold team at DeepMind, expressed excitement over potential new applications: “We’re very excited to see what people do with this”. As academic institutions adapt AlphaFold 3 to their research needs, its impact is expected to extend beyond traditional drug discovery, touching areas such as enzyme engineering, genetic disease research, and agricultural biotechnology.