Vim and Python: The Unbreakable Match

If you write your python codes in vim editor all the time, you should checkout the python-mode vim plugin that comes with features that will make you feel as if you are using an IDE for Python:

  • Syntax highlighting
  • Virtualenv support
  • Run python code (`r`)
  • Add/remove breakpoints (`b`)
  • Improved Python indentation
  • Python folding
  • Python motions and operators (`]]`, `3[[`, `]]M`, `vaC`, `viM`,`daC`, `ciM`, ...)
  • Code checking (pylint_, pyflakes_, pylama_, ...) that can be runsimultaneously (`:PymodeLint`)
  • Autofix PEP8 errors (`:PymodeLintAuto`)
  • Search in python documentation (`K`)
  • Code refactoring (rope_)
  • Strong code completion (rope_)
  • Go to definition (`g` for :RopeGotoDefinition)

It is recommended that you install python-mode with Pathogen which is another Vundle like plugin manager for vim. To install Pathogen, run the following command from your terminal:


mkdir -p ~/.vim/autoload ~/.vim/bundle && \
curl -LSso ~/.vim/autoload/pathogen.vim https://tpo.pe/pathogen.vim

After installation, you can enable Pathogen adding the following in your ~/.vimrc:


" Pathogen load
filetype off

call pathogen#infect()
call pathogen#helptags()

filetype plugin indent on
syntax on

To add python-mode plugin, all you need to do is to add the python-code code from its github repo to the "bundle" folder of .vim:


cd ~/.vim/bundle
git clone https://github.com/python-mode/python-mode.git

To get started with using python-mode in vim, open any vim file and read a short documentation on python-mode:


:help pymode

Summary of Tokenizers and Filters in Apache Solr

Tokenizers

Standard Tokenizer solr.StandardTokenizerFactory splits the text field into tokens, treating whitespace and punctuation as delimiters
Classic Tokenizer solr.ClassicTokenizerFactory very similar to Standard Tokenizer with subtle differences
Keyword Tokenizer solr.KeywordTokenizerFactory treats the entire text filed as single token
Letter Tokenizer solr.LetterTokenizerFactory creates tokens from strings of contiguous letters, discarding all non-letter characters
Lowercase Tokenizer solr.LowerCaseTokenizerFactory tokenizes the input stream by delimiting at non-letters and then converting all letters to lowercase
N-Gram Tokenizer solr.NGramTokenizerFactory Reads the field text and generates n-gram tokens of sizes in the given range given by arguments minGramSize and maxGramSize. For example, minGramSize of 1 and maxGramGramSize of 3 will create tokens of unigrams, bigrams and trigrams
Edge N-Gram Tokenizer solr.EdgeNGramTokenizerFactory Reads the field text and generates n-gram tokens of sizes in the given range given by arguments minGramSize and maxGramSize, except that it will only give you n-grams from an edge given by an argument side which takes values of "front" or "back". For example, minGramSize of 1 and maxGramGramSize of 3 will create tokens of unigrams, bigrams and trigrams starting from the edge. As you can see, it is a much smaller set of n-grams as compared to regular n-grams
ICU Tokenizer solr.ICUTokenizerFactory processes multilingual text and tokenizes it appropriately based on its script attribute
Path Hierarchy Tokenizer solr.PathHierarchyTokenizerFactory creates synonyms from file path hierarchies. arguments are "delimiter" and "replace". For instance: input stream of "c:\usr\local\apache" will be tokenized to output tokens: "c:", "c:/usr", "c:/usr/local", "c:/usr/local/apache" given delimiter as "/" and replace as "/"
Regex Pattern Tokenizer solr.PatternTokenizerFactory Uses a Java regular expression to break the input text stream into tokens. Regex can be provided as an argument to the parameter "pattern"
UAX29 URL Email Tokenizer solr.UAX29URLEmailTokenizerFactory Like standard tokenizer, but will keep email addresses, urls as one tokens
Whitespace Tokenizer solr.WhitespaceTokenizerFactory Only splits the text stream on whitespace

Filters

ASCII Folding Filter solr.ASCIIFoldingFilterFactory converts alphabetic, numeric, and symbolic Unicode characters which are not in the Basic Latin Unicode
block (the first 127 ASCII characters) to their ASCII equivalents, if one exists
Beider Norse Filter solr.BeiderMorseFilterFactory Filters on phonetic matching
Classic Filter solr.ClassicFilterFactory takes the output of the Classic Tokenizer and strips periods from acronyms and "'s" from possessives
Common Grams Filter solr.CommonGramsFilterFactory Creates word shingles by combining common words tokens such as stop words with regular tokens. Common words are provided in a file for argument "words"
Edge N-Gram Filter solr.EdgeNGramFilterFactory This filter generates edge n-gram tokens of sizes within the given range for input tokens
English Minimal Stem Filter solr.EnglishMinimalStemFilterFactory This filter stems plural English words to their singular form
Hyphenated Words Filter solr.HyphenatedWordsFilterFactory This filter reconstructs hyphenated words that have been tokenized as two tokens
Keep word filter solr.KeepWordFilterFactory Discards all tokens except those that are listed in the given word list
Lower Case Filter Factory solr.LowerCaseFilterFactory Converts tokens to lower cases
K Stem Filter solr.KStemFilterFactory KStem is an alternative to the Porter Stem Filter for developers looking for a less aggressive stemmer
Length Filter solr.LengthFilterFactory passes tokens whose length falls within the min/max limit given by arguments "min" and "max"
NGram Filter Factory solr.NGramFilterFactory Generates n-gram tokens of sizes in the given range given by arguments "minGramSize" and "maxGramSize"
Pattern Replace Filter solr.PatternReplaceFilterFactory Applies a regular expression to each token given by "pattern" and "replacement" arguments
Porter Stem Filter solr.PorterStemFilterFactory Applies the Porter Stemming Algorithm for English
Porter Stem Filter solr.StopFilterFactory Filter discards, or stops analysis of, tokens that are on the given stop words file as argument "words"
Synonym Filter Factory solr.SynonymFilterFactory tokens are mapped by synonyms file provided by "synonyms" argument