Pandas Cheat Sheet Github
Cheat sheet for all things open source github python git machine-learning pandas cheatsheet pandas-cheat-sheet github-actions Updated Apr 11, 2021. Office 365 apple app store.
Data Analysis with PANDAS CHEAT SHEET Created By: arianne Colton and Sean Chen DATA STruCTurES DATA STruCTurES ConTinuED SERIES (1D) One-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its “index”.If index of data is not. Pandas Basics (PDF) Pandas (PDF) Importing Data (PDF) Jupyter (PDF) Numpy Basics (PDF) Python Crash Course. Beginners Python Cheat Sheet (Long PDF) Dataquest. Intermediate Python (PDF) Python REGEX (PDF) Others. Python 3 Memento (PDF) R Datacamp. Tidiverse (PDF) data.table (PDF)-xts (PDF) RStudio. R Studio IDE (PDF) Base R (PDF) Data Import.
Cheatsheet for RDKit package in python: (1) Draw molecules in jupyter enviroment; (2) use with Pandas Dataframe (3) Descriptors/Fingerprints and (4) Similarity Search etc.

Installation
Cheat sheet for pandas.setoptions in ipython May 12, 2015. Zhuyi Xue Sometimes it can be annoying that the default display.width (aka linewidth, but deprecated) is too narrow to display the entire dataframe clearly when using pandas in iPython, so do this in iPython. GITHUB FLAVORED MARKDOWN GitHub.com uses its own version of the Markdown syntax, GFM, that provides an additional set of useful features, many of which make it easier to work with content on GitHub.com. USERNAME ˜MENTIONS Typing an @ symbol, followed by a username, will notify that person to come and view the comment.
The RDKit pacakge only supports conda installation.
Setup
Chem vs. AllChem
Pandas Cheat Sheet Github Free
As mentioned in the Getting Started:
The majority of “basic” chemical functionality (e.g. reading/writing molecules, substructure searching, molecular cleanup, etc.) is in the rdkit.Chem module. More advanced, or less frequently used, functionality is in rdkit.Chem.AllChem.
If you find the Chem/AllChem thing annoying or confusing, you can use python’s “import … as …” syntax to remove the irritation:
Basic
Get a RDKit molecule from SMILES. RDKit molecule enable several features to handle molecules: drawing, computing fingerprints/properties, molecular curation etc. Nodejs webscraper.
The RDKit molecules can be directly printed in jupyter enviroment.
Convert a RDKit molecule to SMILES.
Convert a RDKit molecule to InchiKey.
Convert a RDKit molecule to coordinative representation (which can be stored in .sdf file).
Reading sets of molecules
Major types of molecular file formats:
.csvfile that includes a column ofSMILES. SeePandasToolssection..smi/.txtfile that includesSMILES. Collect the SMILES as a list. The following code is an example to read a.smifile that contains one SMILES per line.
.sdffile that includesatom coordinates. Reading molecules from.sdffile. Code Example
Pandas Cheat Sheet Pdf
Draw molecules in Jupter environment
Print molecules in grid.
PandasTools
PandasTools enables using RDKit molecules as columns of a Pandas Dataframe.
| smiles | logSolubility | |
|---|---|---|
| 0 | N#CC(OC1OC(COC2OC(CO)C(O)C(O)C2O)C(O)C(O)C1O)c.. | -0.77 |
Add ROMol to Pandas Dataframe.
| smiles | logSolubility | ROMol |
|---|---|---|
| 0 | N#CC(OC1OC(COC2OC(CO)C(O)C(O)C2O)C(O)C(O)C1O)c.. | -0.77 |
ROMol column stores rdchem.Mol object.
Draw the structures in grid.
Adding new columns of properites use Pandasmap method.
| smiles | logSolubility | ROMol | n_Atoms |
|---|---|---|---|
| 0 | N#CC(OC1OC(COC2OC(CO)C(O)C(O)C2O)C(O)C(O)C1O)c.. | -0.77 | 32 |
Before saving the dataframe as csv file, it is recommanded to drop the ROMol column.
| smiles | logSolubility | n_Atoms | |
|---|---|---|---|
| 0 | N#CC(OC1OC(COC2OC(CO)C(O)C(O)C2O)C(O)C(O)C1O)c.. | -0.77 | 32 |
Descriptors/Fingerprints
The RDKit has avariety of built-in functionality for generating molecular fingerprints/descriptors. A detialed description can be found here.
| smiles | logSolubility | ROMol |
|---|---|---|
| 0 | N#CC(OC1OC(COC2OC(CO)C(O)C(O)C2O)C(O)C(O)C1O)c.. | -0.77 |
Morgan Fingerprint (ECFPx)
AllChem.GetMorganFingerprintAsBitVect Parameters:
radius: no default value, usually set 2 for similarity search and 3 for machine learning.nBits: number of bits, default is 2048. 1024 is also widely used.- other parameterss are ususlly left to default
More examples can be found in this notebook from my previous work.
ECFP6 fingerprint for each molecule has 1024 bits.
Save as a .csv file for futher use (e.g., machine learning). I usually save (1) SMILES as index and (2) each bit as a column to the csv file.
| Bit_0 | Bit_1 | Bit_2 | Bit_3 | Bit_4 | Bit_5 | Bit_6 | Bit_7 | Bit_8 | Bit_9 | .. | Bit_1014 | Bit_1015 | Bit_1016 | Bit_1017 | Bit_1018 | Bit_1019 | Bit_1020 | Bit_1021 | Bit_1022 | Bit_1023 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| smiles | |||||||||||||||||||||
| N#CC(OC1OC(COC2OC(CO)C(O)C(O)C2O)C(O)C(O)C1O)c1ccccc1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | .. | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
1 rows × 1024 columns
Github Cheat Sheet Pdf
Similarity Search
Compute the similarity of a reference molecule and a list of molecules. Here is an example of using ECFP4 fingerprint to compute the Tanimoto Similarity (the default metric of DataStructs.FingerprintSimilarity.
- compute fingerprints
We can also add the similarity_efcp4 to the dataframe and visualize the structure and similarity.
Sort the result from highest to lowest.
Pandas Cheat Sheet Github Download
More Reading
- Offical documentation.
- RDKit Cookbook
This document provides example recipes of how to carry out particular tasks using the RDKit functionality from Python. The contents have been contributed by the RDKit community, tested with the latest RDKit release, and then compiled into this document.
Pandas Visualization Cheat Sheet
Sometimes it can be annoying that the default display.width (aka line_width,but deprecated) is too narrow to display the entire dataframe clearly whenusing pandas in iPython, sodo this in iPython:
If each column is narrow, but you have quite a number of columns and would liketo show them on one line, then do this:
Similarly, you can set the number of rows to display (esp. inJupyter Notebook),
If you don’t have many columns, and want each column to a bigger width,
Git Cheat Sheet Github
If you don’t want long string omitted in a notebook cell in Jupyter,
