Use pyenv to create environments and pip to install native macOS ARM64 wheels or build packages from source. Use an x86-64 Python distribution, like Anaconda or conda-forge, with Rosetta2. Use the experimental conda-forge macOS ARM64 distribution. Go to the DOWNLOAD ANACONDA NOW page. Beneath the “Graphical Installer” buttons for Anaconda for macOS, there are command-line text links for Python versions 2.7 and 3.6. Download the command line installer for Anaconda with Python 2.7 or Anaconda with Python 3.6. Optional: Verify data integrity with MD5 or SHA-256. More info on hashes. Simple step-by-step installation of Anaconda/Spyder. If you try to install Anaconda on macOS Catalina or Big Sur using the 'graphical installer,' you will li. This will install an M1 native conda, and that conda's default environment will by default install M1 native python versions and M1 native versions of modules (if available). There seem to be native osx M1 native wheels for most common modules now available on the conda-forge channel.
Even if you are not a Mac user, you have likely heard Apple is switching from Intel CPUs to their own custom CPUs, which they refer to collectively as 'Apple Silicon.' The last time Apple changed its computer architecture this dramatically was 15 years ago when they switched from PowerPC to Intel CPUs. As a result, much has been written in the technology press about what the transition means for Mac users, but seldom from a Python data scientist's perspective. In this post, I'll break down what Apple Silicon means for Python users today, especially those doing scientific computing and data science: what works, what doesn't, and where this might be going.
What’s changing about Macs?In short, Apple is transitioning their entire laptop and desktop computer lineup from using Intel CPUs to using CPUs of Apple’s own design. Apple has been a CPU designer for nearly a decade (since releasing the iPhone 5 in 2012), but until the end of 2020, their CPUs were only used in mobile devices like the iPhone and iPad. After so many iterations, it became clear to technology observers that Apple’s CPUs, especially in the iPad Pro, had become performance-competitive with low-power Intel laptop CPUs. So it was not a complete surprise when Apple announced at their developer conference in 2020 that they would be moving the entire Mac product line to their own CPUs over the next two years. As promised, Apple released the first Silicon Macs in November 2020. They consisted of a MacBook Air, a 13” MacBook Pro, and a Mac Mini that looked identical to the previous model but contained an Apple M1 CPU instead of an Intel CPU. In April 2021, they added the M1 to the iMac and iPad Pro.So what’s better about these Apple CPUs? Based on many professional reviews (as well as our testing here at Anaconda), the primary benefits of the M1 are excellent single-thread performance and improved battery life for laptops. It is important to note that benchmarking is a tricky subject, and there are different perspectives on exactly how fast the M1 is. On the one hand, what could be more objective than measuring performance quantitatively? On the other hand, benchmarking is also subjective, as the reviewer has to decide what workloads to test and what baseline hardware and software configuration to compare to. Nevertheless, we’re pretty confident the M1 is usually faster for a “typical user” (not necessarily a “data science user”) workload than the previous Intel Macs while simultaneously using less power.
From a CPU architecture perspective, the M1 has four significant differences from the previous Intel CPUs:
Instruction sets, ARM, and x86 compatibilityAs software users and creators, our first question about Apple Silicon is: Do I need to change anything for my software to continue working? The answer is: Yes, but less than you might think.To elaborate on that, we need to dive into instruction set architectures (ISAs). An ISA is a specification for how a family of chips work at a low level, including the instruction set that machine code must use. Applications and libraries are compiled for a specific ISA (and operating system) and will not directly run on a CPU with a different ISA unless recompiled. This is why Anaconda has different installers and package files for every platform we support.Intel CPUs support the x86-64 ISA (sometimes also called AMD64 because AMD originally proposed this 64-bit extension to x86 back in 1999). The new Apple Silicon CPUs use an ISA designed by ARM called AArch64, just like the iPhone and iPad CPUs they descend from. For this reason, these new Macs are often called “ARM Macs” in contrast to “Intel Macs,” although ARM only defined the ISA used by the Apple M1 but did not design the CPU. In general, the naming conventions for 64-bit ARM architectures are confusing, as different people will use subtly different terms for the same thing. You may see some people call these “ARM64” CPUs (which is what Apple does in their developer tools), or slightly incorrectly as “ARMv8” (which is a specification that describes both AArch64 and AArch32). Even in conda, you’ll see the platform names osx-arm64, which you would use for macOS running on the M1, and linux-aarch64, which you would use for Linux running on a 64-bit ARM CPU. We’ll use “ARM64” for the rest of this post because it is shorter than “Apple Silicon” and less clunky than “AArch64.”The observant reader will note that ISA compatibility only matters for compiled code, which must be translated to machine code to run. Python is an interpreted language, so software written in pure Python doesn’t need to change between Intel and ARM Macs. However, the Python interpreter itself is a compiled program, and many Python data science libraries (like NumPy, pandas, Tensorflow, PyTorch, etc.) contain compiled code as well. Those packages all need to be recompiled on macOS for ARM64 CPUs to run natively on the new M1-based Macs.However, Apple has a solution for that, too. Recycling the “Rosetta” name from their PowerPC emulator, Apple includes a system component in macOS 11 called Rosetta2. Rosetta2 allows x86-64 applications and libraries to run on ARM64 Macs unchanged. One interesting fact about Rosetta2 is that it is an x86-64 to ARM64 translator, not an emulator. When you run an x86-64 program (or load a compiled library) for the first time, Rosetta2 analyzes the machine code and creates equivalent ARM64 machine code. This makes a slight upfront delay when you first start an x86-64 application, but the translated machine code is also cached to disk, so subsequent runs should start more quickly. This is in contrast to an emulator (like QEMU) which simulates the behavior and state of a CPU in software, instruction by instruction. Emulation is generally much slower than running a translated binary, but it can be difficult to translate from one ISA to another and have code still run correctly and efficiently, especially when dealing with multithreading and memory consistency.Apple has not disclosed exactly how they were able to generate ARM64 machine code from x86-64 binaries with good performance, but there are theories they designed the M1 with additional hardware capabilities to enable it to imitate some of the behaviors of x86-64 chips when running translated code. The net result is most people see a 20-30% performance penalty when running x86-64 programs with Rosetta2 compared to native ARM64. That is a pretty reasonable tradeoff for compatibility until the selection of ARM64-compiled software catches up.However, there is a catch, especially for users of numerical computing packages in Python. The x86-64 ISA is not a frozen specification, but one Intel has evolved substantially over time, adding new specialized instructions for different workloads. In particular, for the last 10 years, Intel and AMD have rolled out support for “vector instructions,” which operate on multiple pieces of data simultaneously. Specifically, these additions to the x86-64 instruction set are AVX, AVX2, and AVX-512 (which itself has different variants). As you might imagine, these instructions can be handy when working with array data, and several libraries have adopted them when compiling binaries for x86-64. The problem is that Rosetta2 does not support any of the AVX family of instructions and will produce an illegal instruction error if your binary tries to use them. Because of the existence of different Intel CPUs with different AVX capabilities, many programs can already dynamically select between AVX, AVX2, AVX-512, and non-AVX code paths, detecting the CPU capabilities at runtime. These programs will work just fine under Rosetta2 because the CPU capabilities reported to processes running under Rosetta2 do not include AVX. However, suppose an application or library does not have the capability to pick non-AVX code paths at runtime. In that case, it will not work under Rosetta2 if the packager assumed the CPU would have AVX support when they compiled the library. TensorFlow wheels, for example, do not work under Rosetta2 on the M1. Additionally, even if a program works under Rosetta2, not having something like AVX makes the M1 slower at doing certain kinds of array processing. (Again, compared to programs that use AVX on Intel CPUs. For various reasons, only some Python libraries use AVX, so you may or may not notice a big difference depending on your use case.)
Getting Python packages for the M1There are currently three options for running Python on the M1:
How are the M1 “efficiency” cores used by Python?Although sometimes called an 8 core CPU, the M1 is best described as a 4+4 core CPU. There are 4 “high-performance cores” (sometimes called “P cores” in Apple documentation) and 4 “high-efficiency cores” (sometimes called “E cores”). The P cores provide the bulk of the processing throughput of the CPU and consume most of the power. The E cores are a different design than the P cores, trading maximum performance for lower power consumption. While it is unusual for desktop and laptop chips to have two different cores, this is common in mobile CPUs. Background processes and low priority compute tasks can execute on the E cores, conserving power and extending battery life. Apple provides APIs for setting the “quality of service” for threads and tasks, which influences whether they are assigned to P or E cores by the operating system.However, Python doesn’t use any of these special APIs, so what happens when using multiple threads or processes in your application? Python reports the CPU core count as 8, so if you launch 8 worker processes, half of them will be running on the slower E cores. If you run fewer than 8, the OS seems to prefer running them on the P cores. To help quantify this, we created a simple microbenchmark where we made a function that computed cosine a lot:
And then ran many copies of it in a multiprocessing.Pool with different numbers of processes, computing the throughput. The result is shown in this plot:
Throughput (work units/second) and process pool size.The behavior is consistent with the OS scheduler assigning worker processes to the performance cores when there were four or fewer processes. As the process pool grew past four, the increase in throughput per core was reduced until we hit eight processes, indicating the slower E cores were used for the extra processes. Finally, there are no additional CPU resources to take advantage of the past eight processes, and now scheduling effects and memory contention result in a varied but non-improving throughput. Also, interesting to note that for the peak total throughput of the M1 (in this test), 75% of the throughput is provided by the P cores and 25% is provided by the E cores. That’s a non-trivial contribution from the E cores, so it is a good idea to use them if you can, and your work distribution system can handle tasks that might take very different amounts of time between workers. (That is to say, work must be scheduled dynamically, or work-stealing must be supported.) Running a similar scaling test on an Intel Mac shows that the performance gain from the E cores on the M1 is roughly comparable to the gain from hyperthreading on the Intel CPU. However, the two features are entirely different.
What about Linux and Windows on the M1?Although Apple has said they will not prevent users from running other operating systems on M1 hardware, there is currently no Boot Camp equivalent for dual-booting an M1 Mac to Linux or Windows, and efforts to port Linux to run natively on M1 hardware are still highly experimental. However, there are (as of this writing) two ways to run Linux on the M1:
What are the pros and cons for data scientists?The biggest benefits to the M1 for a data scientist are basically the same as for typical users:
Anaconda 2019.03 For Macos Installer
What’s on the horizon?Many of the current drawbacks to the M1 are simply due to it being the first CPU in the new lineup of ARM64-based Macs. To convert the higher performance Macs (especially the Mac Pro) over to this new architecture, Apple will need to release systems with more CPU cores and more memory. At the same time, the Python developer ecosystem will catch up, and more native ARM64 wheels and conda packages will become available.But the most exciting development will be when machine learning libraries can start to take advantage of the new GPU and Apple Neural Engine cores on Apple Silicon. Apple offers APIs like Metal and ML Compute which could accelerate machine learning tasks, but they are not widely used in the Python ecosystem. Apple has an alpha port of TensorFlow that uses ML Compute, and maybe other projects will be able to take advantage of Apple hardware acceleration in the coming years.Going beyond Apple, the M1 demonstrates what a desktop-class ARM processor can do, so hopefully, we will see competition from other ARM CPU makers in this market. For example, an M1-class ARM CPU from another manufacturer running Linux with an NVIDIA GPU could also be an impressive mobile data science workstation.
The M1 Macs are an exciting opportunity to see what laptop/desktop-class ARM64 CPUs can achieve. For general usage, the performance is excellent, but these systems are not aimed at the data science and scientific computing user yet. If you want an M1 for other reasons, and intend to do some light data science, they are perfectly adequate. For more intense usage, you’ll want to stick with Intel Macs for now, but keep an eye on both software development as compatibility improves and future ARM64 Mac hardware, which likely will remove some of the constraints we see today.
Enterprise Data ScienceSecuring the Open-Source Pipeline with Anaconda CVE CurationRead More
For PractitionersWhat’s new with fastparquet?Read More
Anaconda CultureWhy Anaconda Created a Company Policy to Give More Time Off
Whether you’re a big, small or medium enterprise, Anaconda will support your organization. As a free and open-source distribution of Python and R programming language, it’s aim is to easily scale a single user on one laptop to thousands of machines. If you’re looking for a hassle-free data science platform, this is the one for you.
Anaconda is leading the way for innovative data science platforms for enterprises of all sizes.
Anaconda provides you with more than 1,500 packages in its distribution. In it you will find the Anaconda navigator (a graphical alternative to command line interface), Conda package, virtual environment manager, and GUI. What makes Conda different from other PIP package managers is how package dependencies are managed. PIP installs Python package dependencies, even if they’re in conflict with other packages you’ve already installed. So, for example, a program can suddenly stop working when you’re installing a different package with a different version of the NumPy library. Everything will appear to work but, you data will produce different results because you didn’t install PIP in the same order. This is where Conda comes in. It analyzes your current environment and installations. This includes version limitations, dependencies, and incompatibility. As an open source package, it can be individually installed from the Anaconda repository, Anaconda Cloud or even the conda install command.
You can even create and share custom packages using the conda build command. The developers will then compile and build all the packages in the Anaconda repository, providing binaries for Windows, Linux and MacOS. Basically, you won’t worry about installing anything because Conda knows everything that’s been installed in your computer.
Extend your reach with Anaconda Navigator
The built in graphical user interface or GUI allows you to launch applications while managing Conda packages, environments and channels. This means the GUI will complete the process of installing packages without asking for a command-line command. It even includes these applications by default: JupyterLab & Jupyter Notebook / QtConsole / Spyder / Glueviz / Orange / RStudio / Visual Studio Code.
Where can you run this program?
Anaconda 2019.07 has these system requirements:
- Operating system: Windows 7 or newer, 64-bit macOS 10.10+, or Linux, including Ubuntu, RedHat, CentOS 6+.
- System architecture: Windows- 64-bit x86, 32-bit x86; MacOS- 64-bit x86; Linux- 64-bit x86, 64-bit Power8/Power9.
- 5 GB disk space or more.
Anaconda developers recommends you to install Anaconda for the local user so you won’t need administrator permissions. Or, you can opt to install Anaconda system wide, which does require administrator permissions.
Is there a better alternative?
If you’re looking for simple Python-dedicated environment, then you need PyCharm. Targeted specifically for Python programmers, this integrated development environment is filled with programming tools that can impress both new and experienced developers. It provides all the tools in a centralized system so you can increase your efficiency and effectiveness. Features like code analysis, graphical debugger, and unit tester helps you integrate Python programs with version control systems. In fact, every single output you make will be capable of web development from different web frameworks like Django, web2py, and Flask. It offers automated tools like code refactorings, PEP8 checks, and testing assistance to create your code, but what stands out the most is Smart Assistance. It fixes any of your errors or complete portions of your code. With PyCharm, you can expect a neat and maintainable code.
Anaconda’s host of innovative options makes it the best data science platform for all enterprises. By offering superior collaboration tools, scalability, and security, you never have to worry about gathering big data again.
Should you download it?
If you have experience with other package management and deployment programs, then make the big switch by downloading Anaconda.
- Extensive data science tools
- Functions can be scaled
- Flexible nodes
- Reliable cloud storage
Anaconda Macos Installation
- Complex for beginners
- Hard to maximize by small organizations
- Minimal automated features
Macos Install Anaconda