Saturday, September 19, 2015

Setting up ssh keys

UPDATE (15/09/2024):  Just found an even cleaner shell command on Ubuntu Documentation:

By default, the public key is saved in the file ~/.ssh/id_rsa.pub, while ~/.ssh/id_rsa is the private key. Now copy the id_rsa.pub file to the remote host and append it to ~/.ssh/authorized_keys by running:

ssh-copy-id username@remotehost

I keep forgetting this quick one-liner, so I thought I'd add it to my list of useful tricks.  I usually just Google it, but How-To Geek is an awesome resource (link)

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub | ssh user@hostname 'cat >> .ssh/authorized_keys'


Thursday, April 9, 2015

Python Debugging Notes

So I had a great stackexchange comment that I referred left open in my Chromium tab forever, and I just realized that I lost it somehow.  I've searched my history and everything, it's gone in the many clicks of reddit, barstoolsports, etc.  So this is just going to be my ongoing blog post for debugging notes as I come across them as I need them.  So here goes:

Scipy Lecture on Debugging

Boltons

Starting Debugging on Error Discussion

Python Cookbook Discussion

The comment I lost, was someone explaining their process of navigating the Traceback using the commands:
list
up
down
Sorry, I can't find the comment so that I can cite his help.

This blog is an ongoing note for my own reference, so expect it to be updated here and there.

EDIT: And now I found the stackexchange comment, here:
http://stackoverflow.com/questions/16131500/py2app-error-in-find-needed-modules-typeerror-nonetype-object-has-no-attribu

Sunday, March 29, 2015

Remove all old linux kernels, headers and modules for Debian based systems

This has come up often for me with my linux machines, so I'll just blog a blog on the topic.  I came across two useful posts:

RemoveOldKernels

Ubuntu Cleanup: How to Remove All Unused Linux Kernel Headers, Images and Modules

The former is useful, but it only removes the kernels.  I want to remove all headers, etc. associated with them.  So, before I run the command from the latter blog post, I just want to confirm what I'm going to remove, as should you, with the following code (note, not as root just to be even more cautious):
dpkg -l 'linux-*' | sed '/^ii/!d;/'"$(uname -r | sed "s/\(.*\)-\([^0-9]\+\)/\1/")"'/d;s/^[^ ]* [^ ]* \([^ ]*\).*/\1/;/[0-9]/!d'
 Perfect, now I can just run the one-liner from that latter blog post:

dpkg -l 'linux-*' | sed '/^ii/!d;/'"$(uname -r | sed "s/\(.*\)-\([^0-9]\+\)/\1/")"'/d;s/^[^ ]* [^ ]* \([^ ]*\).*/\1/;/[0-9]/!d' | xargs sudo apt-get -y purge

Boom, I just got rid of over 3 GB's of old linux kernels on my system.

Monday, March 16, 2015

Python String Format Cookbook

As I'm tutoring someone through their introductory Computer Science course, I keep finding myself getting caught up in the Python 3 string formatting.  His assignments seem to focus a lot on how to print out various formats, and I'm still stuck in handling how they worked in Python 2.7.  I keep googling, and googling, and I often just find myself coming back to this reference.

Python String Format Cookbook

I'm just putting this up so that I can keep going back to this reference as I still adjust from Python 2 to Python 3.

EDIT: Just saw this cool writeup on Reddit: PyFormat.info

Monday, February 23, 2015

Linear Solve in Python

http://matrixprogramming.com/2011/03/linear-solve-in-python-numpy-and-scipy

This is a great tutorial on Linear Solver approaches in Python.  In particular, I like the reference to the Cholesky Decomposition.  For those that aren't familiar, Cholesky Decomposition only works on symmetric positive definite matrices.  This is pretty common in the Statistics world, since those are the properties of a well defined Covariance Matrix.

When I write up my code, I'll make sure to write up a cool tutorial on how to do the Cholesky Decomposition in Python for inverting a Covariance Matrix.  I promise!

Sparse Matrices in Python

In one of my previous jobs, my colleague wrote a very neat Python module that leveraged Sparse Matrix approach as defined here in Wikipedia.

I've been meaning to write something up similar to that, because I needed to use something similar to that in my dissertation work.

Wait, that's right, I won't need to do something like that, because it's right here in SciPy.

http://docs.scipy.org/doc/scipy/reference/sparse.html

Ridge Regression and Cross Validation in scikit

My dissertation work, Ridge Restricted Maximum Likelihood (RREML) is an extension of Ridge Regression applied to parametric covariance structures.  In my dissertation we applied it to Spatial Statistics, but it would also apply to Time Series as well.

One of the things that I struggled with in my dissertation approach was choosing the appropriate ridge constant.  I never got around to the much more favorable Cross Validation approach, but I instead used a secondary likelihood approach to estimate the ridge constant.

As I'm writing my algorithm in Python, I'm definitely going to leverage the scikit Ridge Regression approach to apply it to my RREML model.

http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression

Spatial Distances in Python with GeoPy

So as I'm redoing my dissertation work, one of the functions I used was to calculate the distance between two locations based upon their coordinates (latitude and longitude).  At the time, I was referencing and using what is commonly referred to as the Great Circle distance.  So in the middle of my re-write into Python I stumbled across:

GeoPy

They not only provide the Great Circle distance calculation in their module, they introduced me to the Vincenty Distance.  According to the module authors this is a more accurate approach for calculating the distance.

It looks like I have a new way to calculate my spatial distances, and even better, I don't have to program it up myself.  I'm loving this Python re-write process already.

Time Series Analysis in Python with statsmodels

Wow, what a phenomenal discussion on Time Series analysis in Python.  I was unaware of this statsmodels project, but now I'm psyched to find it.  First of all, let me link to the talk that I'm referring to in my title:

Time Series Analysis in Python with statsmodels

Of course, this lead me to tracking down these experts, to learn more about what they do and wow I'm quite impressed.  Here are links to their blogs:

Wes McKinney
Josef Perktold
Skipper Seabold

These three seem to be very involved in Scientific Computing in Python, check out their blogs and links to talks, etc.  I know I will.



Sunday, February 22, 2015

Navigating Python documentation in Emacs

I've been a bit frustrated with my ability to navigate Python documentation in Emacs.  I came across the pydoc command, which seems very useful.  Then I wondered how I could leverage this in emacs, maybe write a lisp function in my .emacs file?

Enter a simple google search for "emacs pydoc", and you come across this fine project by John Kitchin.  He has a summary of it here:


Now, I love my elpy setup, so I thought I'd propose adding this into it.  No promises from Jorgen, but he did say that at the very least it needs to be its own package.

John, of course is very busy, and I've always wanted to do something like this in Emacs, so I volunteered to take on this project.  So get ready to follow along on how I create a MELPA distributed Emacs package. Yay, this should be fun. :)

You can follow along on github:

https://github.com/statmobile/pydoc

Copying Git files/directories to a new project.

I had recently done something similar when forking off my dissertation work from a subversion repository to a Git repository and then separating out the R library completely.  Anyway, I'm in the process of trying to port someone's previous work into its own project as a soon to be MELPA installable Emacs package.

Long story short, I need to get this guy's file and I want to keep his history in tact.  So, without further ado, this fine gentleman has a quick way of doing that.  See his post here.

http://blog.neutrino.es/2012/git-copy-a-file-or-directory-from-another-repository-preserving-history/

Tuesday, February 17, 2015

Web Interface for Python

An ongoing project I've been wanting propose to a certain very large government agency involves creating a front-end for Python algorithms.  The first question is what should the interface be:
  • GUI - Would be great, but would need to be developed for all OS's including Mobile in my vision.  How fancy do I get, do I use Tcl/Tk, or even Qt?  I'm starting to feel overwhelmed already with all the Python GUI frameworks.  
  • Web - This would be ideal, as long as it follows standard HTML standards, then one deployment should work for everyone with Internet access.  But how?  Django doesn't seem to be interactive enough for my needs and also seems a bit of an overkill with the ORM.  Flask seems to be too light.  Hmm, and then I got some advice from the author of http://pythonprogramming.net on Reddit.
He pointed me to two amazing projects, and I'm seriously thinking about diving back into this proposal.  He recommended I look into the following projects:

"Brython is designed to replace Javascript as the scripting language for the Web. As such, it is a Python 3 implementation (you can take it for a test drive through a web console), adapted to the HTML5 environment, that is to say with an interface to the DOM objects and events"
Trinket:
"Trinket lets you run and write code in any browser, on any device.
Trinkets work instantly, with no need to log in, download plugins, or install software.
Easily share or embed the code with your changes when you're done."


These look to be two amazing projects that I think I could leverage.  Here, check it out.

Trinket: Brython:

Sunday, January 11, 2015

EDIT: Use scipy.linalg over numpy.linalg.

Per my previous post, I mistakenly referenced numpy.linalg and scipy.linalg as if they were the same.  Upon looking deeper at the documentation for scipy.linalg it clearly states the following:

scipy.linalg contains all the functions in numpy.linalg. plus some other more advanced ones not contained in numpy.linalg
Another advantage of using scipy.linalg over numpy.linalg is that it is always compiled with BLAS/LAPACK support, while for numpy this is optional. Therefore, the scipy version might be faster depending on how numpy was installed.
Therefore, unless you don’t want to add scipy as a dependency to your numpy program, use scipy.linalg instead of numpy.linalg
So for all my purposes, I will only use scipy.linalg.

LAPACK in Python and R

EDIT: See my follow-up post as well!

While porting my dissertation work that I wrote in R to Python, I need to leverage some of the great features of R such as its easy to use wrappers of LAPACK, specifically the Cholesky Decomposition to calculate the inverse of my covariance matrix.  For those not familiar with LAPACK, it's a free open source library for calculating Linear Algebra routines.  I'm talking all of them, and it's included in many open source scientific software due to its wide range of applicability and free and open nature.  The one caveat... It's written in Fortran*.

Now, not to hate on Fortran, but not many people are programming their software or running their data analysis using it.  Fortunately for us, some very good computer nerds out there wrote awesome wrappers in R and in Python (through NumPy and Scipy) to access them.  It's been recommended to me that one should run optimized LAPACK libraries for your processor and Operating System, and build R (and probably NumPy and SciPy) pointing to those optimized routines.  I'll let you read through the R Administration Manual to decide for yourself.

You can find the R discussion on LAPACK routines in the R Extensions Manual here.

As for Python, just check out the documentation for numpy.linalg or scipy.linalg.

Okay, now the real point of this note to myself, is because Googling access to LAPACK in Python led me to this awesome Blog Post:

Linear Solve in Python (NumPy and SciPy)

  did an awesome tutorial on using Cholesky Decomposition, and I thought I'd pass it along to anybody interested in leveraging these routines.

* I vaguely recall reading somewhere that LAPACK is usually compiled in C after porting LAPACK from Fortran to C using f2c.

Friday, January 9, 2015

Vectorize your functions in NumPy

One of the features I loved in R, was that I could easily put a matrix into a unitary function.  Picture this, I have a spatial covariance function which relies on the distances.  All I would need to do is write the spatial covariance function, and then just put in the distance matrix.

Maybe it will be easier to see some sample code:

code /code

Feel free to read a little more here, but it's a great way to avoid writing loops, especially when setting covariance matrices.

Tuesday, January 6, 2015

Statistics vs. Machine Learning

Ha, yes that title is just click bait, although I'm not sure how I'm even soliciting clicks.  I'm just as likely to jump into a Bayesian vs. Frequentist debate post.  I will only say this, as a Statistician I fully embrace the Machine Learning field.  If we were to do a Venn Diagram of the two fields, I believe the intersection would take up most of the Sample Space, and have a probability measure of at least 95%.

With that said, I just thought I'd post a link that I recently stumbled across.  Usually I just put these links under my Computing tab (it's already there), but I think this one warranted a blog post.  It's a curated list of Machine Learning Algorithms across a variety of programming languages.  It's definitely going to come in handy for me.

So without further delay, welcome to:

Awesome Machine Learning