NSE data needs to be open
Last week, I was searching for NSE historical OHLC data. To my surprise, all I could find was with a 1-day interval, only till the previous day. If one needs real-time data, they have to pay a hefty amount to NSE for access to their API. One can go to a redistributor. They won’t charge as much as NSE, but still, it won’t be anything close to what a data scientist or an individual can afford.
Realized, someone has to step in and make the data open. So, I decided to build an API, for accessing the real-time data and historical OHLC data with better time intervals.
I started building a scraper last week. It’s completed and stable now. I can run it on my home server and scrape the real-time data, and write it to a NoSQL database(considering ElasticSearch, but skeptical about the costs). But the bigger problem is the costs involved in IP rotations.
To solve this, I am thinking of handling scraping in a distributed system. Wherein the API works on a credit-based system. The users have to run software on their device(Windows/Mac/Ubuntu/Android compatible), which credits the user’s account with tokens. I am thinking of proposing a 1:20 tokens system i.e., the user will get 20 API access tokens for every scraping request from the user’s device.
To be continued…