while read i; do something-with-$i; done < filename
cat -n
will lead to problems. Fix this by instead doing
cat ./-n
cat file.txt | awk 'criteria'
where criteria could be something like:
$1 > 100
or
$2 ~/SomePattern/
somecmd | awk '{printf "%-15s%-15s-%-8s",$1, $2, $3}'
apt-cache search keyword
import os print os.path.dirname(packagename.__file__)
import pkgutils import os [name for _, name, _ in pkgutil.iter_modules([os.path.dirname(packagename.__file__)])
The Unicode mess
All computation should occur on unicode objects. Understand that ‘utf-8’ is an encoding and should only be used while printing out to a file or the terminal. That can be achieved with a simple blah.encode(‘utf-8’) Do not use such encoding for internal string processing. Also, “str” literally means byte in python so avoid use, and instead use unicode(blah), wherever you might be tempted to use str(blah)
]]>Missed the keynote as the earliest BART reached Powell St by 8:45. Wanted to attend the whole of the time-series workshop but there were other interesting things that had my attention. Nevertheless, anyone interested in time-series classification should look at the comprehensive evaluation by Anthony Bagnall et. al., that was presented at the workshop. The key message remains the same as before (also Bagnall): It is very hard to beat Nearest Neighbor-based DTW, which is fast and effective. But I have my reservations on the kind of “time-series” these may be applied to. The outliers detection workshop was good. Jeff Schneider’s talk on “converting anomalies to known phenomenon” was good, particularly for the aspect of non-parametric Renyi Divergence methods for anomaly detection. Two other talks were intriguing. Custom tree algorithm (ARDT) for dealing with class imbalance issues. Better thresholds for splitting can be generated using Renyi Entropy instead of Shannon entropy as is commonly used for Information Gain in decision trees. The other talk on “Fast and Accurate Kmeans Clustering with Outliers — Shalmoli Gupta” was an approach for dealing with outliers in the k-means setup. Instead of a 2-step process of removing outliers and then applying k-means, two approaches were presented. First, the sampling approach, deals with sampling the data points with 1/z probability each where z is the number of outliers in the dataset. The k-means is then inferred over this sample. Unless z is very large, this ensures that the k-means are not outlier dependent. The second approach solves a linear program to jointly discover centroid and outliers. They empirically show that, although computationally expensive, the LP is not much better than the first (sampling) approach. Hence, better to use the sampling approach for robust k-means. However, both these approaches require the knowledge of “z”, or the number of outliers in the dataset. That is impractical. Moreover, the experiments were performed with the full knowledge of the number of outliers and the number of clusters in the dataset. Practically, that is not possible to know apriori.
The keynote from Jennifer Tour Chayes from Microsoft Research was all about how there was a need for a limiting theory on Graph Theory, just like there is Thermodynamics for Physics and how that lead to the conceptualization of Graphons and the kind of applications that might be useful in, especially in understanding large networks and generating large networks. The best student paper award was received by Christos Faloutsos’s team for their work on FRAUDAR, a graph-based approach for detecting fraudulent reviews and reviewers, even in the presence of camouflage, or when fraudsters masquerade as honest reviewers by hijacking their reviews. In the large-scale data mining session, the talk on XGBoost, by far the most successful implementation of gradient boosted trees, highlighted improved accuracy, speed, scalability and portability over vanilla GBDTs. The improved accuracy is a result of the regularization term, while the improved speed is the result of caching and sparsity-aware splitting criteria. Particularly interesting on the second day was the plenary panel on Deep Learning “Is Deep Learning the new 42?“, a reference to the computer in the Douglas Adams’ Hitchhiker’s Guide to the Galaxy. Prof. Jitendra Malik, Prof. Isabelle Guyon, Prof. Pedro Domingos, Prof. Nando de Freitas, and Prof. Jennifer Neville participated in the panel that covered all areas from interpretability, explanability, hype, data scarcity, energy consumption and other issues commonly associated with Deep Learning. Several anecdotal examples were given, both for the failures and successes of Deep Learning. Particular to note were that Deep Learning might be the latest craze, but it is an improvement. It may not stay, but will evolve into the future state of art, such as Representation Learning. Open challenges that are likely to become important in the coming years are Causality, Representation Learning, Explanability, Privacy-preservation and bias prevention, and my personal favorite, learning from less, aka unsupervised learning. Regarding energy consumption, it is important to note that the human brain is highly efficient and uses only 20 watts. If we were to replicate the human brain with neuro-morphic chips, we would need the entire energy supply of New York and San Francisco put together though. We still have far to go. Some interesting comments were made about how the algorithms or models are not biased, but rather stupid if they do not do the meaningful thing. debate also arose on the idea where explanability is more important than accuracy or vice versa? For example, do you want a highly accurate complex non-interpretable algorithm that correctly detects cancer, or do you wish a less accurate simpler and interpretable model. Interesting point. Moreover, there were suggestions of using a decision tree on top of Neural outputs to give an impression of interpretability to some who is highly desirous of it but doing complicated things under the hood. The last session on invited talks on data science was not really useful to me, barring the first talk of Prof. Jeff Schneider who highlighted the challenges of Active Optimization (also Design of Experiments or Bayesian Optimization).
The day started with a keynote by George Papadoupoulos, a researcher who in the last six years transitioned to being a venture capitalist. He gave a talk on big-data investments from the perspective of the VC community. I think the key takeaways were that funding usually follows value generation and that successful exits (going public!) are far and in-between. It takes rougly 8.5 years for an investment to bear fruition. He also emphasized on the importance of getting a good founding partner as you are going to be stuck with him or her for at least 8.5 years :). He suggested that the more you take the human out of the loop, the more you are prized and valued. Merely creating analytics toolkits that involve human, albeit needed, has the lowest value while predictive analytics has moderate value. When asked about areas he does not invest in, he said that he does not invest in startups that are going against existing monopolies that have a sheer scale (for example Amazon AWS). This was followed by a panel on VC insights in big-data investments. They echoed more of what Dr. Papadoupoulos had said earlier. But made some key points. Do not worry about markets, money, funding and solve an incredibly hard problem that you see is currently unsolved and everything else will fall in place. There was also a suggestion that technology or algorithmic prowess is hardly a differentiator anymore, given that most platforms are public or open source. It is the availability and exclusive accessibility to data that is the key differentiator. Then I attended some talks on Deep Learning and Embedded systems, which had some interesting papers. There was a SmartReply system from Google Gmail that automatically suggests diverse, unique and intelligent responses for mobile and are already being used to answer 10% of all mobile emails for their users. It was interesting to see a pipeline of simple, intuitive and proven technologies (by now, LSTM-based RNNs seq2seq models are nauseating, but they work!!) being used for a concrete application that is being used. Then there was a talk by Jure Leskovec’s team on node2vec, a word2vec like embedding generation algorithm for networks and graphs. If we can represent each node as a vector, while still retaining some information about its neighborhood, it would mean that many machine learning algorithms can be directly applied to such a vector representation of the graph. They had better results than spectral methods (Matrix factorization of the adjacency matrix), but it is not clear to me how they get that. Reading required! Then I attended some talks in the Unsupervised learning and Anomaly Detection session, which again highlighted the difficulty of cracking this challenge. Two approaches were particularly of note: An approach that utilized a Semi-Markov model over VAR (Vector Auto Regression)-based approach to discover phases of operation over each flight and then used changes in the distributions of VAR parameters to detect anomalies in-flight. Another approach, which received the runner up best research paper award, I did not attend, but what I could glean from my colleagues, was that they use correlations between pairs of variables, (history of one variable to predict the future of another) and monitor changes to those (this is quite similar to what I have been doing), but I need to study their full method in more detail. Finally, the day 2 culminated on a much anticipated Turing lecture by Whitfield Diffie, the inventor of the Diffie-Hellman key exchange. Starting from the 14th century, Dr. Diffie gave a whirlwind tour of the evolution of the field of Cryptography and its anti-thesis, Cryptanalysis. He didn’t seem perturbed that Quantum Computing is around the corner (I do not know if it is!), and highlighted Homomorphic approaches as a promising direction for Cryptography. He ended with a very interesting question, “Does an individual have right to secrecy (from the government)?” (or does the government have the authority to get a full disclosure?). Something to ponder about given the current events in secrecy and the protection of personal information.
The last of the conference was short but had a good start with a keynote by Nando de Freitas on recent advances in deep learning. While I was familiar with much of the work that was presented, there were many important ideas over the last 2 years that I did not have insight into: NPI, residuals, attention, identities, learning to learn. Nando provided examples from domains apart from the usual culprit: Images. The most surprising result for me was the “learning to learn gradient descent with gradient descent“, wherein an LSTM-based network was trained using gradient descent to learn the optimal way of doing a gradient descent! In the several experiments they did, this automatically discovered optimization strategy outperformed many of the well-known hand-crafted, theoretically sound and wildly popular approaches such as SGD, Nesterov’s AG, Adagrad, ADAM etc. Wow! Is this the beginning of the end of design of optimization algorithms? Only thing left is for the LSTM to now spit out a theorem that shows better guarantees than what Nesterov has spent his life on! Apart from that the day was lackluster and short. Only other mentionable work was the work by Carlos Guestrin’s team on “explaining away any classifier” with the goal of building trust in the learnt model. How do we know that the model’s accuracy is high for the right reasons? A naive approach could be just use an easily interpretable model, such as a Decision Tree, but that would mean compromising on accuracy. Complicated models are harder to explain, but more accurate. I need to study the paper, but prima facie, it appears that their strategy marries the best of both worlds. Use a complicated model to infer globally and to make predictions, but explain away using a simpler model in the local neighborhood of the test instance. I have my reservations with this approach, but at least someone is thinking about this crucial direction for the success of ML in domains that have “human experts” who will never believe a black box unless it agrees with their understanding.
]]>
The first step is actually get the nvidia display drivers working on Ubuntu 16.04 LTS for the GTX 1080. Finish that step first and then get started on this adventure to install tensorflow on your machine.
Note: If in spite of installing the correct NVidia Driver’s for Ubuntu 16.04 LTS, you are staring at a black screen, it is possible that is a display resolution related issue. Please check out the detailed and helpful comment by Constantin below for the fix.
chmod +x cuda_8.0_your-version_linux.run ./cuda_8.0_your-version_linux.run
Know where you are installing it. Better to use the default “/usr/local/cuda”
cp -r cuda/* /usr/local/cuda/
chmod a+r /usr/local/cuda/include/cudnn.h chmod a+r /usr/local/cuda/lib64/libcudnn*
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64" export CUDA_HOME=/usr/local/cuda'
apt-get install swig python-dev python-numpy python-wheel
apt-get install pkg-config zip g++ zlib1g-dev unzip
chmod +x bazel-version-installer-os.sh ./bazel-version-installer-os.sh --user
cxx_builtin_include_directory: "/usr/local/cuda-8.0/include"
. Save the file and get back to the command prompt
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl
Now, if all the steps are complete, you should have a working tensorflow installation. Easy to find out if it is indeed working for you. Trying this simple short script which should tell if you GPU is being used as one of the devices
#!/usr/bin/env python # Author: Abhay Harpale # Website: abhay.harpale.net/blog/ """ gpu test for tensorflow """ import tensorflow as tf # Creates a graph. a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) # Creates a session with log_device_placement set to True. sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # Runs the op. print sess.run(c)
If the system is indeed using the GPU as the device, it should output:
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.8.0 locally I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate (GHz) 1.835 pciBusID 0000:01:00.0 Total memory: 7.92GiB Free memory: 7.56GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0) I tensorflow/core/common_runtime/direct_session.cc:175] Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0 I tensorflow/core/common_runtime/simple_placer.cc:818] MatMul: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:818] b: /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core/common_runtime/simple_placer.cc:818] a: /job:localhost/replica:0/task:0/gpu:0 Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0 MatMul: /job:localhost/replica:0/task:0/gpu:0 b: /job:localhost/replica:0/task:0/gpu:0 a: /job:localhost/replica:0/task:0/gpu:0 [[ 22. 28.] [ 49. 64.]]
Now, go fire up some deep learning jobs and take it to the next level.
If this article has worked for you, it is highly likely that some of your friends and followers will benefit from it too. Please share by clicking on the appropriate channels.
Please comment about your experience with getting Tensorflow to work with Ubuntu 16.04 LTS and new GPU’s from the NVidia Pascal line-up, including GTX 1080, GTX 1070, GTX Titan X, and GTX 1060. If you had to customize these steps for your system, please also provide those specifics so that other people visiting this blog as can benefit. Thanks for visiting!
]]>Important Note: All these steps need to be performed with the display connected to the integrated graphics port of your motherboard, not the NVidia GPU. You can connect your display to the NVidia GPU only after a successful installation of the latest drivers.
For the steps in this article to work, you have to get to a login prompt or TTY. Initially, you might be staring at an empty black screen.
Option 1: Hit Ctrl+Alt+F1. It should likely bring a black screen with login prompt. That’s good. Login and follow the steps in the fixes given below
Option 2: If Ctrl+Alt+F1 does not work for you, you will have to edit the special kernel flags to get to a login prompt. For this to work, follow these steps:
These steps work only after you have been able to get to a TTY or a command-prompt. Use one of the options above to get a TTY before you perform these steps.
sudo bash
# blacklist added by abhay harpale for nvidia gtx 1080 installation on ubuntu 16.04 LTS blacklist amd76x_edac blacklist vga16gb blacklist nouveau blacklist rivafb blacklist nvidiafb blacklist rivatv
# login as root sudo bash # remove existing nvidia stuff apt-get purge nvidia-* apt autoremove # reboot shutdown -r now
sudo bash apt-get remove --purge nvidia* dpkg --configure -a shutdown -r now
# login as root sudo bash # stop the GUI service lightdm stop # stop the X-server if it was already running, or nvidia will complain about it killall xinit # download the latest nvidia drivers. At the time of this writing 367.35 was the latest cd ~/Downloads/ wget us.download.nvidia.com/XFree86/Linux-x86_64/367.35/NVIDIA-Linux-x86_64-367.35.run # make the downloaded file an executable chmod +x NVIDIA-Linux-x86_64-367.35.run # run the NVidia installer ./NVIDIA-Linux-x86_64-367.35.run # follow the prompts in the process # once the process is over, reboot shutdown -r now
Note: If in spite of installing the correct NVidia Driver’s for Ubuntu 16.04 LTS, you are staring at a black screen, it is possible that is a display resolution related issue. Please check out the detailed and helpful comment by Constantin on my post about getting Tensorflow working with GTX 1080 + Ubuntu 16.04 LTS.
If this article has worked for you, it is highly likely that some of your friends and followers will benefit from it too. Please share by clicking on the appropriate channels.
If you have any comments, questions or suggested modifications to improve the steps listed here, please drop me a comment below. Most importantly, if these steps did not work for you or something else did, please let me know that too, along with the error description you are getting.
]]>* If all eigenvalues are positive, then the matrix is positive definite. If all eigenvalues are positive or zero-valued, then the matrix is positive semi-definite. Similarly, for the definition of negative definite and negative semi-definite. There are benefits to knowing that a matrix is positive definite, positive semi-definite, negative definite and negative semi-definite.
* A matrix is singular if and only if any of the eigenvalues is zero
* To optimize quadratic expressions of the form $latex f = \mathbf{x}^T\mathbf{A}\mathbf{x}$, subject to $latex \|\mathbf{x}\| = 1. If $latex \mathbf{x}$ is an eigenvector of $latex \mathbf{A}$, then $latex f$ takes on the corresponding eigenvalue. Its maximum (or minimum) is the maximum (or minimum) eigevalue of $latex \mathbf{A}$.
* The determinant of a matrix is equal to the sum of all eigenvalues of a matrix.
]]>* Singular value decomposition (SVD): Eigen-decomposition is not defined if a matrix is not square and we must use another kind of decomposition called the SVD. It is written as $latex \mathbf{A} = \mathbf{U}\mathbf{D}\mathbf{V}^T$. $latex \mathbf{D}$ is a diagonal matrix, not necessarily square, and its values are called the singular values of the matrix. The columns of $latex \mathbf{U}$ are called the left-singular vectors, and the columns of $latex \mathbf{V}$ are called the right-singular vectors. There is an interesting relationship between the singular vectors and eigen-decomposition. It turns out that the left-singular vectors are the eigenvectors of the matrix $latex \mathbf{A}\mathbf{A}^T$. The right-singular vectors are the eigenvectors of the matrix $latex \mathbf{A}^T\mathbf{A}$. The non-zero singular values of $latex \mathbf{A}$ are the square roots of the eigenvalues of $latex \mathbf{A}\mathbf{A}^T$ or $latex \mathbf{A}^T\mathbf{A}$.
* Cholesky decomposition: Positive-definite matrices can be decomposed into a lower triangular matrix and its conjugate transpose as $latex \mathbf{A} = \mathbf{L}\mathbf{L}^T$. Every Hermitian positive-definite (and therefore any real-valued symmetric positive definite) matrix has a unique Cholesky decomposition.
* LDL decomposition: Related to the Cholesky decomposition, $latex \mathbf{A} = \mathbf{L}\mathbf{D}\mathbf{L}^T$, is called the LDL decomposition, and is possible for some indefinite matrices too, unlike Cholesky which requires the matrix to be positive definite. In this case though, $latex \mathbf{L}$ is required to be a unit lower triangular matrix.
* QR decomposition: $latex \mathbf{A} = \mathbf{Q}\mathbf{R}$, where $latex \mathbf{Q}$ is a square orthogonal matrix and $latex \mathbf{R}$ is a upper triangular matrix. It is typically used as an alternative for solving system of linear equations, without explicitly computing the inverse.
* Rank factorization: For a $latex m \times n$ matrix $latex \mathbf{A}$ of rank $latex r$, $latex \mathbf{A} = \mathbf{C}\mathbf{F}$, where $latex \mathbf{C}$ is a a full-rank matrix of size $latex m \times r$, and $latex \mathbf{F}$ is a full-rank matrix of size $latex r times n$.
]]>* Symmetric matrix: A square matrix that is equal to its transpose. $latex \mathbf{A} = \mathbf{A}^T
* Diagonal matrix: A matrix where all non-diagonal entries are zero $latex $\mathbf{A}_{i,j} = 0, \forall i \ne j$. Square diagonal matrices are denoted as $latex \text{diag}(v)$. But note that diagonal matrices are not required to be square.
* Identity matrix: A diagonal matrix where all diagonal entries are 1
* Inverse matrix: An inverse of a matrix is another matrix such that their product is an identity matrix. $latex \mathbf{A}^{-1}\mathbf{A} = \mathbf{A}\mathbf{A}^{-1} = 1$
* Inverse of diagonal matrix: Is easy to compute: $latex \text{diag}(v)^{-1} = \text{diag}([1/v_1,\ldots,1/v_n])^T$
* Unit vector: A vector with unit norm $latex ||\mathbf{x}\||_2 = 1$
* Orthogonal vectors: If $latex \mathbf{x}^T\mathbf{y} = 0$, then $latex \mathbf{x}$ and $latex \mathbf{y}$ are considered orthogonal. If their norms are non-zero, then it is the case that the angle between them is $latex 90^{\circ}$. Why? Because a product of vectors can be represented as $latex \mathbf{x}^T\mathbf{y} = |\mathbf{x}||\mathbf{y}|\cos\theta$, where $latex \theta$ is the angle between the two vectors.
* Orthonormal vectors: If two orthogonal vectors are also unit vectors, then they are orthonormal
* Orthogonal Matrix: A square matrix whose rows are mutually orthornomal and the columns are mutually orthonormal. Therefore $latex \mathbf{A}^T\mathbf{A} = \mathbf{A}\mathbf{A}^T = \mathbf{I}$
* Positive definite Matrix: A square symmetric matrix $latex \mathbf{A}$ such that for any non-zero vector $latex \mathbf{x}$, $latex \mathbf{x}^T\mathbf{A}\mathbf{x} > 0$. It is sometimes denoted as $latex \mathbf{A} > 0$
* Positive semi-definite Matrix: A square symmetric matrix $latex \mathbf{A}$ such that for any non-zero vector $latex \mathbf{x}$, $latex \mathbf{x}^T\mathbf{A}\mathbf{x} \ge 0$. It is sometimes denoted as $latex \mathbf{A} \ge 0$.
]]>#!/usr/bin/env python # Author: Abhay Harpale """ plot multicolored lines in matplotlib """ import matplotlib.pyplot as plt import numpy as np def find_contiguous_colors(colors): # finds the continuous segments of colors and returns those segments segs = [] curr_seg = [] prev_color = '' for c in colors: if c == prev_color or prev_color == '': curr_seg.append(c) else: segs.append(curr_seg) curr_seg = [] curr_seg.append(c) prev_color = c segs.append(curr_seg) # the final one return segs def plot_multicolored_lines(x,y,colors): segments = find_contiguous_colors(colors) plt.figure() start= 0 for seg in segments: end = start + len(seg) l, = plt.gca().plot(x[start:end],y[start:end],lw=2,c=seg[0]) start = end
Here is some code to generate an example plot using this:
x = np.arange(1000) y = np.random.randn(1000) # randomly generated values # color segments colors = ['blue']*1000 colors[300:500] = ['red']*200 colors[800:900] = ['green']*100 colors[600:700] = ['magenta']*100 plot_multicolored_lines(x,y,colors) plt.show()]]>
While there are possibly many strategies for identifying and setting the best thresholds, the relationship of Receiver Operating Characteristics (ROC) to the thresholds provides an intuitive and flexible way to set thresholds. ROC is a curve of True Positive Rate (TPR) to the False Positive Rate (FPR) for decreasing values
of the score threshold. It can be computed in a few steps:
1. Sort the scores in a descending order
2. Start with the highest score as the initial threshold
3. Compute the TPR and FPR for the current threshold and record it
4. Lower the threshold to the next unique score and go back to step 3 till all the scores have been exahausted
The formulae for computing TPR and FPR at a given score $latex s$ are quite simple:
$latex \text{TPR}_s = \frac{\text{Number of true positives with score above } s} {\text{Total number of positives}}$
$latex \text{FPR}_s = \frac{\text{Number of false positives with score above } s} {\text{Total number of negatives}}$
Or, quite simply, use the sklearn.metrics.roc_curve function from the scikit-learn package for computing ROC. That returns matched lists of TPR, FPR, and corresponding thresholds.
Once you have these three series (TPR, FPR, and thresholds), you just analyze the ROC curve to arrive at a suitable threshold. You plot the curve and identify the point along the ROC curve that is satisfactory to your needs (high TPR with low FPR). Since you will seldom find perfect TPR for zero FPR, you will have to make a compromise and allow for some false positives to cover most true positives. Choose a threshold that satifsfy the outcomes you are after.
Here’s code for generating and plotting the roc curve along with the corresponding thresholds
from sklearn.metrics import roc_curve, auc import matplotlib.pyplot as plt import numpy as np import seaborn from sklearn.datasets import make_classification # sample data generation for demonstration only x,y = make_classification(n_samples=10000, n_features=1, n_informative=1, n_redundant=0,n_repeated=0, n_clusters_per_class=1) scores = x[:,0] true_labels = y ### actual code for roc + threshold charts start here # compute fpr, tpr, thresholds and roc_auc fpr, tpr, thresholds = roc_curve(true_labels, scores) roc_auc = auc(fpr, tpr) # compute area under the curve plt.figure() plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % (roc_auc)) plt.plot([0, 1], [0, 1], 'k--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver operating characteristic') plt.legend(loc="lower right") # create the axis of thresholds (scores) ax2 = plt.gca().twinx() ax2.plot(fpr, thresholds, markeredgecolor='r',linestyle='dashed', color='r') ax2.set_ylabel('Threshold',color='r') ax2.set_ylim([thresholds[-1],thresholds[0]]) ax2.set_xlim([fpr[0],fpr[-1]]) plt.savefig('roc_and_threshold.png') plt.close()
A sample chart generated by this script is shown below. In the chart, the dashed black line is the baseline and your curve (the blue line) should be above that baseline, if your alogrithm is any good. In this case, we see that initially the TPR rises very fast, for a low FPR (that is a good thing) and later, the gains are not as significant. In general, a good first strategy is to choose the threshold that corresponds to the bend along the ROC curve. In this case, that corresponds roughly to a threshold of 0.
]]>