General Archives - Page 10 of 36

Convert CSV to Parquet Files

Posted on 07/01/202121/05/2021 by Anteelo Master

Apache Parquet

Apache Parquet is a columnar data storage format, which provides a way to store tabular data column wise. Columns of same date-time are stored together as rows in Parquet format, so as to offer better storage, compression and data retrieval.

What is Row Oriented Storage Format?

In row oriented storage, data is stored row wise on to the disk.

Columnar Storage Format

In columnar storage format above table will be stored column wise.

As you can see in this format all the IDs are together and so are names and salaries. A Query selecting Name column will require less I/O time as all the values are adjacent unlike in row oriented format.

Using Apache Parquet

Using Parquet format has two advantages

Reduced storage
Query performance

Depending on your business use case, Apache Parquet is a good option if you have to provide partial search features i.e. not querying all the columns, and you are not worried about file write time.

Apache Parquet format is supported in all Hadoop based frameworks. Queries selecting few columns from a big set of columns, run faster because disk I/O is much improved because of homogeneous data stored together.

To use Apache spark we need to convert existing data into parquet format. In this article we will learn to convert CSV files to parquet format and then retrieve them back.

CSV to Parquet

We will convert csv files to parquet format using Apache Spark.

Below is pyspark code to convert csv to parquet. You can edit the names and types of columns as per your input.csv

Above code will create parquet files in input-parquet directory. Files will be in binary format so you will not able to read them. You can check the size of the directory and compare it with size of CSV compressed file.

For a 8 MB csv, when compressed, it generated a 636kb parquet file.

The other way: Parquet to CSV

You can retrieve csv files back from parquet files.

You can compare the original and converted CSV files.

You can provide parquet files to your Hadoop based applications rather than providing plain CSV files.

Realm vs SQLite

Posted on 06/01/202121/05/2021 by Anteelo Master

While starting a new application, we often wonder which database to use, especially if the application is database intensive. Recently I came across Realm, which is really well-built and surprisingly very fast compared to SQLite. In this post, I aim at showing how Realm compares to SQLite.

Let’s start with looking at basic CRUD operations in Realm.

Create:

Read:

Update:

Delete:

Now let’s look at the comparison between Realm and SQLite.

1. Architecture

Realm does not use SQLite as its engine, rather it is a database built from scratch to run directly on phones, tablets and wearables. Realm is an object oriented database that uses C++ as its core. However, SQLite uses a transactional SQL database engine.

2. Easy to Use

Realm is easy to use as it uses objects for storing data. Realm data models are defined using normal Java classes that are a subclass of RealmObject. This class can define properties of primary keys, required fields etc. using simple annotations. Also, there is no requirement for converting the Realm query results into objects for further use, this conversion is handled automatically by Realm. Realm query results are returned in a generic RealmResults instance which is a collection of the model class. This RealmResults instance is iterable like a List. However, the SQLite query results are returned in a Cursor object.

3. Cross Platform

Realm stores its data in a native object format where the objects are stored in a universal table-based format using C++ as its core, rather than storing them as is in their language specific format. This makes Realm easy to use across multiple languages, platforms and queries.

4. Speed

Realm performance is faster than SQLite, almost 10x faster according to the benchmark results. An example of this is- data fetched by Realm queries have a better performance enhancement than the insert queries as compared to SQLite.

5. Data Synchronization

The data in Realm is never copied, it works on live objects. Using Realm queries, we get a list of object references thereby working on the original Realm data. Thus, any changes made to the queried data is reflected in the actual stored data as well after a simple commit. However, in SQLite, we get a copy of the data from the database through a query. Thus, using SQLite, any changes made on the queried data need to be persisted by writing them back to the database again.

Companies using Realm:

A few companies using Realm are Amazon, Google, Starbucks, Intel, Ebay, Adidas, Zynga, Nike, IBM, and CISCO.

Limitations:

Realm currently has a few limitations:

Realm does not have support for auto-incrementing ID’s and composite keys.
In order to maintain a balance between flexibility and performance, Realm imposes a certain limitations on various aspects of storing information in a Realm which include:
1. The upper limit on the name of a class is 57 characters, which includes the _class prefix.
2. The field names have an upper limit of 63 characters.
3. There is no support for nested transactions.
4. String and byte arrays cannot be greater than 16 MB.
There is no support for final, transient, and volatile variables.
Realm model classes can only extend the Realm object.
Realm model class must include the default empty constructor.
There is currently support for only ‘Latin Basic’, ‘Latin Supplement’, ‘Latin Extended A’, and ‘Latin Extended B’ (UTF-8 range 0-591) in sorting and case-insensitive string matches. Further, setting the case insensitive flag in queries when using equalTo(), contains(), endsWith() or beginsWith() will only work on characters from the English locale.
Although Realm files can be handled concurrently by multiple threads, we cannot hand over Realms, Realm objects, queries, and results between threads.
Realm files cannot be accessed by concurrent processes, they can only be accessed by one process at a time. Different processes should either copy Realm files or create their own.
RealmObject are live objects, and might be updated by changes from other threads. Although two Realm objects returning true for RealmObject.equals() must have the same value for RealmObject.hashCode(), the value is not stable, and should neither be used as a key in HashMap nor saved in HashSet.

Use JMeter for Mobile Performance Testing

Posted on 05/01/202121/05/2021 by Anteelo Master

What comes to our mind for the very first time when we hear about JMeter? Is it Performance Testing? Or Web App Load Testing? Well, most of us are unaware that JMeter can also be used for performance testing of Android/iOS apps. It’s quite similar to recording scripts like in case of web apps, all we have to do is to configure a proper proxy on mobile devices. So here in this blog post, we would be listing down the process to record a performance test script in JMeter for Android and iOS platforms.

Prerequisites: JMeter version 3.0, Android phone (versions above Jellybean) or iPhone (version 8.0 onwards)

JMeter Configurations

1. Launch JMeter -> Navigate to File option -> Templates -> Select Recording -> Click on Create (So now we have added all the necessary parameters for Recording scripts)

2. Go to HTTPS Test Script Recorder -> Set port to 8080

Now find your IP Address by ifconfig for Linux and ipconfig for Windows. We will load up this IP address on our phone Android/iOS phone to setup proxy.

Mobile side Configurations

iOS proxy configuration:

Go to Settings–>Wi-Fi option.
Click on your connected network.
Select ‘Manual’ option from the HTTP Proxy section.
Set ‘Server’ value as your computer’s IP address and ‘Port’ value to 8080 as JMeter configuration. Refer above image to get an idea about this setup.
We need appropriate JMeter Certificate and save it to our phone.
Install this certificate on your iPhone.

Android proxy configuration:

We need appropriate JMeter Certificate and save it to our phone.
Download the zip file for the certificate and send the certificate on mail. And install through the mail. Once the certificate is installed phone will ask to apply a lock. And a notification appears showing Network may be monitored.
Now click on Wi-fi Settings -> Long press on the Network you are connected to -> Click on Modify Network -> Advanced Settings -> Change Proxy to Manual -> In Proxy Host name enter the IP of your computer -> Set Proxy to 8080 -> click save
Now we are set to start recording and running the scripts.
Go to Jmeter -> HTTP(S) Test Script Recorder click Start (this would start the recording). Remember that the port in JMeter Global Settings and Mobile must be the same.
Add a Listener -> Add Result Tree to HTTP(S) Test Script Recorder
Perform any actions on mobile devices and the user can see the actions getting recorded on JMeter.

Replay the actions by increasing the load and monitor the performance

In JMeter go to thread groups and alter the number of threads, Ramp up period and Loop count.

Thread means the number of active users.
Ramp-Up is the amount of time JMeter must take to send threads for execution.
Loop Count is used to specify the number of times to execute the Performance Test.

Adding Appropriate Listeners and analyzing the performance on different loads. Listener that we have used here is Response time over time Listener.

These Listeners don’t come by default with JMeter we need to download the jar and put it into JMeter’s lib/ext directory. Now restarting JMeter shows these Listeners in JMeter’s list of Listeners.

Advantages of using JMeter for mobile performance testing

First of all, JMeter is an open source software. So zero investment.
It’s very user friendly with a interactive UI
It’s easy to learn
Every test script results can be best monitored using different Listeners in JMeter
By far the easiest though effective way to check the mobile performance

Kotlin Programming Language now supported in Android

Posted on 04/01/202121/05/2021 by Anteelo Master

Google has just unveiled in their I/O 2017 conference that Android will officially support Kotlin programming language and got huge applause from the IT world. It is an open source project under Apache 2.0 license and built by JetBrains who earlier built IntelliJ. Google has announced that Kotlin is brilliantly designed, mature and will take android development to the next level as it is fast and fully supported by Java. Using Kotlin in Android development will be more fun.

For Android developers, Kotlin as a “first-class” language, is a chance to use a modern, powerful and mature language. It will help to solve the run-time exceptions and source code verbosity. Kotlin provides the flexibility which means it can be easily introduced into an existing project. Kotlin emits the java bytecode and can call java and vice versa out of the box. “The effortless inter-operation between the two languages” was a large part of Kotlin’s appeal to the Android team.

Developers can play around the Kotlin using Android Studio 3.0 and there is no need to install any extra plugin or worry about the compatibility issue. You can open the existing Java file, and then choose the option “Convert Java File to Kotlin File”. Android Studio will then add all required Kotlin dependencies into your project and the equivalent Kotlin code. Isn’t this cool?

Kotlin’s major goal is to be available on multiple platforms. Also, they are pretty much busy on working on native platforms like iOS, IoT, macOS, embedded systems.

As said by Google, some of the apps have already started using Kotlin like Expedia, Flipboard, Pinterest, Square. They are getting very positive feedback.

Kotlin has a lot in common to Java in structure as it’s object oriented and statically typed. It is designed for the problem that actually Java solves.

Some of the cool features of Kotlin

Nullability is a very common problem in Java. Basically having null references in the application can kill it. Kotlin finds the difference between the reference that can hold the null or that can’t, effortlessly.
Switching from Java is easy as there is an option to convert Java files in Kotlin directly from the plugin in Android Studio.
Kotlin is versatile and interoperable with Java as developers can write their own module that will work with Java code. It’s compatible with the existing Java library.
Kotlin’s architecture is written in such a way that one has to write less code; at least 20% less while development, which is fascinating.
A common problem in Android development that causes inefficiency to Java code is extra garbage collection. So Kotlin does a fabulous job to avoid this problem.
The Lean syntax in Kotlin language is very convenient. Kotlin balances terseness and readability in syntax which helps to write the code faster and allows better productivity.
Kotlin also provides Functional programming support with zero-overhead Lambdas.
It imposes no runtime overhead.
Kotlin’s extension functions are helpful in building really clean APIs and solve a bunch of other problems.
The == operator does exactly what the community expects.

To learn more about Kotlin, check out their official website.

Designing a kick-ass checkout experience

Posted on 01/01/202114/05/2021 by Anteelo Master

They confessed that they dreaded checkout as if some unknown calamity would fall upon them and they’d be forced to leave their cart. Sometimes they indeed had to abandon their cart because they didn’t get upfront information about shipping or tax charges. Other times, the steps were too hard to follow and they had a hard time placing the order.You think I am making this up? Below is an image which shows the results of a survey done with people in the US who gave reasons for abandoning their e-commerce cart-Among all the reasons, Extra costs (60%) and Create an account (37%) are the two topmost areas where users feel dejected.

Think of it this way- you go to a brick-and-mortar store, you put everything you need in your cart and stand in the long queue to wait for your turn. But just when you reach there, the person at the counter says that you need to pay 13% convenience tax as a first time customer. And, in case you want to waive it off, you need to register at the store. Isn’t that abominable?

Moving ahead, the third most cited reason for abandonment is- “too long/complicated process”. This is the area where, I feel, that designers can greatly help improve the user experience.

How?

By designing with the utmost care and keeping your end users in mind. You need to understand how people accomplish tasks and the role human psychology plays when people are browsing on the internet. It will help you design better interfaces.

For the checkout page, I have figured out five ways that can help you design a better experience.

Design a checkout flow that customers can see

You know how trekkers gather the strength to climb to the top? They look at the summit and it gives them the adrenaline rush to conquer the mountain. A bit overboard of an example because the checkout process is nothing like mountain hiking. But you get the drift, don’t you?

In short- If you can see it, you can achieve it!

For example- once a user has added all the items in the bag and wants to go ahead and purchase, display the entire checkout flow up front. Show it with a progress bar so that the user knows what information s(he) has to fill at each stage. This also breaks down the process into chunks and the user is not burdened with everything on one page.

The above example of the checkout page of Myntra shows that there are three steps- BAG, ADDRESS, and PAYMENT. Each stage is highlighted so that as the users proceed in the flow, they know which stage they’ve reached and can review their order.

Display the right visual hierarchy

Visual hierarchy helps the users in two ways- it gives them the clarity on the items they are buying and it gives them control over their actions. When done right, visual hierarchy helps users understand what needs their attention.

For example- if they need to review their order or if they need to proceed to the payment page.

Visual hierarchy can also be used to attract the user’s attention to some promotional offer (eg- free shipping on orders over $X) or to motivate them to check out an interesting offer which can be clubbed with their existing one.

For example- in the screenshot below, the “bank offer” is highlighted so that users who are eligible (and interested) can look at it and decide whether they want to use it or not.

Keep the form fields minimal and clear

Designing forms that result in better conversations is an art. The number one principle that you need to keep in mind while designing forms is- don’t ask for unnecessary information.

For example, in the screen below, the payment form is precise and it helps users fill out the card details in the same sequence as they appear on the card. All payment options are listed in the left navigation menu and in case the user is unable to remember the card details, (s)he can use the other options.

The labels are correctly placed and they help the user enter the right information in the right input field. Apart from that, the security marks on the right-hand side (‘verified by VISA”) leave a positive impact on the user and they are assured of their money going in the right hands.

However, there is one more aspect which can be taken care of in this particular interaction. When I enter the card number (I entered an incorrect one), it didn’t show me any error.

Instead, it allowed me to move ahead in the journey. It was only after clicking on “make payment” that it showed me the error message on the top of the form. That too, incorrect. The error message is misleading and I can’t spot a “red” field anywhere.

A better approach is to handle errors on the go. It’s possible to validate the card number as soon as users enter it. So, why not make them aware of their mistake right when they make it?

Make customer sign up optional

It happens to most of us. We look at something and we instantly want to buy it. But then we change our mind the moment the merchant portal asks us to sign up. Even in the survey conducted by Baymard Institute, “the site wanted me to create an account” is the reason given by 37% of people for cart abandonment. As customers, it’s natural to feel uncomfortable sharing our personal information (name, address, phone number) during the first interaction with a new store. Moreover, it’s tedious to create a new account on every site we browse.

A better option to fix this problem is to offer them guest checkout. By giving them the freedom to choose their way, you enhance their user experience. If they are already flattered by your service, they’ll make an account. If they are yet to discover that you’re awesome, you’ve given them a reason to love your page by giving them the “guest checkout” option.

Here’s an example from Nike’s website. When you “proceed to checkout”, it asks for your shipping details directly (no login required). However, it also gives you an option to sign-up in case you want faster sign up in the future.

Offer them all good things, including your help

Not all users sail smooth. Some hit a dead end, some get confused and some become frustrated. Even after taking care of everything (forms inputs, error messages, etc), it’s fair to assume that some users will still face problems.

In such situations, it’s crucial to provide users with assistance instead of sending them to help pages (that are not always helpful) or FAQ-pages that are hardly specific to their problems.

Nike’s website takes care of this aspect of user experience.

Knowing that someone is there to help you out brings more credibility and confidence in users and they can peacefully shop to their heart’s delight. It may not be a grand feature in the bigger scheme of things, but it’s the small things that matter the most.

The design of the checkout page is crucial in the consumer journey. If the design is good, it can help convert a customer into a returning user. However, if the design is bad, it not only causes loss of business but also gives people a chance to badmouth your website’s experience.

Therefore, it’s important to design a well-thought-out checkout page. There are many other methods that can guide you to create easy and creative checkout pages which delight users and make their shopping experience fun. But I believe that if the ways listed above are followed it can drastically increase the conversion rate through the checkout process.

ReactJS gains power with Flux

Posted on 01/01/202121/05/2021 by Anteelo Master

I recently started learning ReactJS. With a very good documentation available on GitHub I found it very easy to learn. I created a sample application in ReactJS & it was working fine!!With some experience in the same I would like to begin by mentioning two of its salient points :

HTML and Javascript in a single file which makes it easy to maintain
Component Driven Development in which DOM is divided into components to make it reusable and easily testable

Then I heard about React with Flux and I was intrigued to know why do we need Flux when React is fine on its own. I did not have much development experience in ReactJS which is why I didn’t realize the power of Flux. Eager to learn, I told myself let’s pull our socks a little, learn Flux and also help train others.

I created an application using ReactJS but initially didn’t use Flux in order to understand its additional benefits like:

Maintainability
Readability
Unidirectional data flow

To see the same in practice, Let’s take a look at developing an application first without using Flux and then using Flux.

Let’s start by understanding basic definition/architecture of Flux –

Flux Architecture

There are four key points to Flux

Action – Actions are very simple because they only need to take request from View and pass it to the Dispatcher. It acts as mediator between View and Dispatcher.
Dispatcher – Dispatcher is responsible to pass information to Store. It does that via <this.dispatch> method in dispatcher.
Store – Stores plays with data. It communicates with backend server as per request from View.
View – Views are for displaying information. If views need some information then they get it from Store and if it needs to perform some action or update/add any information then it informs Action.Action calls Dispatcher which in turn fetches data from Store.

Let’s say we have to create a TODO application. It would be a single page application divided into following components

Header
- Todo Count
Todo Form
Todo List
- Todo Item

Expected behaviour

Todo Count will display the total count of Todo Items.
Todo Form will have input field.
On submitting the form, new Todo Item should be appended to Todo List and count should be increased in Todo Count.

Application without using Flux

Let’s create a TodoItem Class which will render individual todo item. It will receive information of todo item from its parent component

The below component is TodoList which is responsible for rendering all todo items. This component gets “data” from its parent class via props. “this.props.data” is a list of items which iterates and call TodoItem which we created above.

TodoCount component is responsible to display the count of items. It will get count from TodoHeader component.

TodoHeader component displays the header of an application and it calls the TodoCount component to display the count of total items.

Below is the TodoForm component. Let’s focus on the same because it creates the data in addition to rendering.

The handleSubmit method is called when Submit button is clicked and it calls TodoSubmit method which it got from the Application component.

When I was doing this I had questions –

Why are we doing it this way?
Why Application component submits the form and not TodoForm?

The reason is that Application Component is the parent/grandparent of all components. And data flows from parent to child and child can’t change the state of Parent. Also, there can’t be any communication flow between siblings.

Now if TodoForm will submit the form then how will this information be passed to TodoList or TodoHeader component?

That’s why we made Application component to be responsible for submitting the form.

The Application component is parent of all. It has methods on which we need some discussion.

loadDataFromServer – It will load data from server but in our example there isn’t any server so we have hard coded the data.

handleTodoSubmit – This method would be called by the TodoForm as explained in TodoForm component.

When this method would be called then it will change the state with new created Todo item which will trigger re-rendering of Application component and all child components will be updated with the new information.

As you can see that if sibling components want to communicate with each other then they need to pass data to their parent component.

Like in our example Todo Form wants to tell Todo List that new item has been added.

That’s why Application component is passing callback handleTodoSubmit. So on submit of Todo Form it is calling handleTodoSubmit method of Application component via callback. And handleTodoSubmit is updating the state of Application which causes the re-rendering of Application component by updating Todo Item and Todo Header.

In our example there is only 1 hierarchy but in real time scenarios there could be multi level hierarchy where the innermost child would want to update the other level of innermost child. Then you have to pass the callback method over all the hierarchies.

But that would make it less maintainable and it also impacts readability in negative ways which will result in losing the power of React.

Now Let’s try this same application using Flux

As we have seen there are 4 layers in Flux. Let’s write a code according to its structure.

Action

Actions are called by Views. If View wants to update data in Store then View tells about this changeto Action. Here we have created AppAction. In this, there is only 1 action i.e addItem.

This would be called when we want to add a new todo Item. This action further calls handleViewAction of dispatcher(AppDispatcher)

Dispatcher

In AppDispatcher, handleViewAction is defined which is called by AppAction. In handleViewAction action is passed which will tell what action is performed.

There is a this.dispatch inside handleViewAction which is a pre-defined method of Dispatcher. This method will internally call the Store.

Store

Let’s discuss methods of Store one by one.

dispatcherIndex – Execution starts from here because of the AppDispatcher.register. Whenever dispatch method of Dispatcher is called then it would pass the action information wherever AppDispatcher.register is defined.

In our example we have only one action “ADD_ITEM” but in most of the cases there would be multiple actions. So we have to first define the type of action and based on that we will perform actions and will call emit method.

Here, we are calling addTodoItem method from dispatcherIndex method and after that emitChange

addTodoItem – In this method we are not doing anything except pushing new todo item to the todoItems array.

emitChange – In emitChange, we are using this.emit(CHANGE_EVENT) method, this will let listeners of CHANGE_EVENT know that something is changed.

addListener – This method is used by Views to listen to the CHANGE_EVENT.

removeListener – This method is used by Views to remove listener.

getTodoItems – This method will return all the todos. This would be called by TodoList component.

getTodoCount – This method will return the count of all todos. This would be called by TodoCount component.

In TodoList component, we are fetching todo items from Store directly by calling AppStore.getTodoItems in getInitialState.

Now question is how will this component get to know when new todo item has been added?

The answer is in componentWillMount. In this method we are calling AppStore.addChangeListener which will listen to the event whichis defined in addChangeListener of Store. If there would be any change then it will call the _onChange which will reset the state.

Similar to TodoList, TodoCount will also get data from Store and there is a listener defined in componentWillMount.

Conclusion

On comparing both the codes you will find that creating application without using Flux is still easy But if you understand the flow of Flux then your choice would always be Flux because flow of Flux would be the same even when your application is getting complex. It provides us with additional benefits of maintainability, readability, unidirectional data flow.

To summerize, lets try again to understand the flow.

In TODO component, we are rendering 3 components <TodoHeader>, <TodoForm>, <TodoList>. There is no need to pass any callback, no need to define any state because <TODO> component is not doing anything except calling other components.

TodoHeader component is also not doing anything except rendering or calling other component.

In TodoCount component, we need to display the count of total items. We will get count from AppStore on rendering of TodoCount. But count may get update after rendering of TodoCount component. That’s why we have added listener in componentWillMount and removed in componentWillUnmount method. So if there would be any update in Store related to count than it would call _onChange method and will change the state of count accordingly.

TodoList component is displaying the list of Todo Items and for that it needs TodoItems. The behaviour is similar to TodoCount in terms of communicating with AppStore.

In TodoForm, we want to submit the form and want to tell other components that new item has been added. So on submit, instead of telling other components it is just passing information to AppAction which will dispatch it to store with the help of AppDispatcher. And AppStore will update/add the data.

In AppStore, there is emit method called in emitChange which will allow listeners to know that information has been updated/added.

As you can see, components are not dependent on each other. If they need any information they will get it from Store and if they want to update anything then they will create an Action and it would be Dispatched to Store.

Clearly, the data flow is unidirectional so it would be easy to understand and maintain.

GraphQL – API interactions made efficient

Posted on 28/12/202014/05/2021 by Anteelo Master

APIs have become ubiquitous with the advancement of mobility. All clients need to access data on the server and API’s define a contract to access that data.REST has been a popular way to expose data from a server after SOAP since it was lightweight and simple for clients. However, when the concept of REST was developed, client applications were relatively simple. With more rapid movements towards mobility, the client applications have grown in complexity and so has their data requirements from the server. And, REST APIs have shown to be too inflexible to keep up with the rapidly changing requirements of the clients that access them. And more often than not, it is very difficult to implement a fully REST compliant API. Most of the APIs are somewhat REST.

There are 2 major factors that have been challenging the way API’s are designed:

Increased mobile usage calls for efficient data loading. With REST, you often have to make multiple calls to fetch the complete details of a resource.
Variety of different frontend frameworks and platforms. Each platform has need of different representation of the same data. As a REST API developer, we mostly send all the data and leave it up to the client to ignore the data that is not needed. But this puts a load on the user’s data plan.

GraphQL, unlike REST, is a more efficient, flexible and powerful new option. The new API standard was developed and open-sourced by Facebook. It is now maintained by developers and open source community from all over the world.

It was developed to cope with the need for more flexibility and efficiency. It solves many of the shortcomings and inefficiencies that developers experience when interacting with REST APIs.

GraphQL enables declarative data fetching where a client can specify exactly what data they need. Instead of multiple endpoints, which return fixed data structure, there is a single endpoint which returns precise data that the client asked for.

To better understand the difference between GraphQL and the REST, let’s consider a blogging mobile application where we want to show a user’s profile screen with the following details on the screen:

User’s name

User’s All blogs title

User’s followers

Remember how we used to gather data with a REST API? It was typically done by accessing multiple endpoints. In the example, /users/<id> endpoint can be used to fetch the initial user data.

Also, it’s likely to have a /users/<id>/posts endpoint that will return all the posts for a user.

Next, the third endpoint will be /users/<id>/followers that will return a list of followers per user.

This leads to the client sending multiple calls, waiting on all those calls, chaining their responses, and gracefully handling if any one of the calls fails.

This highlights the first problem stated above.

Now coming to the second problem-

In this case, at the first step, we are fetching not only user’s name, which we need, but we are also fetching other data which is not required putting more load on the user’s data plan. Similarly, a lot of additional data is being sent across in other calls. This must be to support other clients like a corresponding web application which displays more information as the real estate available increases.

A possible solution within REST realm to solve the above problems would be that you could design your API in a way that exposes the data that is required by this particular profile page. But this is, again, not an optimal approach.

Why, you ask? Especially in today’s times, you want to be able to iterate quickly on your designs and experiment with different features. If you have to tweak your API every time you change your designs on the front end, you are not able to move fast. And keep in mind the versioning you would have to handle in your APIs to keep serving previous versions of your application. And you are in a mess.

Another elegant solution

In GraphQL on the other hand, you’d simply send a single query to the GraphQL server that includes the concrete data requirements. The server then responds with a JSON object where these requirements are fulfilled.

Here only a single request is sent to the server with the query in the request’s body with the exact data requirements and it will return the exact data needed by the application.

This solves our problems of over and under fetching.

I am sure many of you must have had this question in your mind-

An iOS app is so different from an Android app and miles apart from even the web app. How would we return different data for each client?

If I didn’t know about Graph QL, my solution would be either of 2:

Let’s send all data required by either of apps and leave it on the application to parse as per their requirement. Over-fetching.
Let’s create different endpoints or get platform information from the client in request header and return application specific data. On your path to maintainability issues.

GraphQL solves this problem by giving the power to clients to write their own queries to get data that they need. It’s generous that way- always taking the smallest possible request. Whereas, REST generally defaults to the fullest.

Some of the advantages of GraphQL are-

Typed schema

How many time has it happened that the API does not return data in the correct data type? Numbers and booleans are wrapped as a string. And then you debug it to find out the correct data type. This is because REST API contract only defines the data and not the types for that data.

In contrast, GraphQL uses a strong type system to define the capabilities of an API. All the types that are exposed in an API are written down in a schema using the GraphQL Schema Definition Language (SDL). This schema serves as the contract between the client and the server to define how a client can access the data.

GraphQL is a Query Language first

REST APIs are often created initially simple, then slowly more and more query language-like features are tacked on over time.

The most reasonable way to provide arguments for queries in REST is to shove them in the query string. Maybe a ?status=active to filter by status, then probably sort=created, but a client needs sort direction so sort-dir=desc is added. This is all taken care of in GraphQL because it is foremost a query language so you can easily add in query parameters without affecting the readability or creating a chaos of different types of queries.

{

human(id: “1000”) {

name

height(unit: FOOT)

}

GraphQL removes “Include vs Endpoint” indecision

Another customization consideration that comes up a lot is when to offer included relationships, and when to use another endpoint. This can be a difficult design choice, as you want your API to be flexible and performant, but includes used past the most trivial uses can be the opposite of that.

You start off with overly simplistic examples like /users?include=comments,posts but end up on /trips?include=driver,passengers,passengers.avatar,passengers.itineraries and worse.

REST would call for a HATEOAS approach, which would need you to make one call to the /trips endpoint, then hit “links”: { “driver”: “https://example.com/drivers/123” }, and again for passengers, and again for child data of each of those passengers.

This is a big win for GraphQL, as forcing the include approach, the GraphQL will be both efficient and consistent.

And now the disadvantages of GraphQL-

REST makes caching easier at all levels

In an endpoint-based API, clients can use HTTP caching to easily avoid re-fetching resources, and for identifying when two resources are the same. The URL in these APIs is a globally unique identifier that the client can leverage to build a cache. In GraphQL, though, there’s no URL-like primitive that provides this globally unique identifier for a given object. However, you can cache your GraphQL results at the front end using Apollo Client and Relay.

GraphQL query complexity

GraphQL doesn’t take away performance bottlenecks when you have to access multiple fields (authors, articles, comments) in one query. Whether the request was made in a RESTful architecture or GraphQL, the varied resources and fields still have to be retrieved from a data source. As a result, problems arise when a client requests too many nested fields at once. Frontend developers are not always aware of the work a server-side application has to perform to retrieve data, so there must be a mechanism like maximum query depths, query complexity weighting, avoiding recursion, or persistent queries for stopping inefficient requests from the other side.

So to conclude, GraphQL is a powerful technology to make the front end applications easier and more efficient. It has its pros and cons and should be taken into consideration when making important architectural decisions based on the specific use cases.

JS Developer: Learn Python

Posted on 21/12/202014/05/2021 by Anteelo Master

Python and JS are the two most popular programming languages. I was working as a MEAN/MERN Stack Software Engineer where I used Javascript as a coding language. Recently I switched to Python for a second project.

In this blog, I will share my experience of working on both languages at the same time. Let’s get started.

Below are the code snippets which describe the major syntax differences. Can you observe how different they are?

The syntax b/w Javascript and Python are very different as shown in the above sample blocks. Sometimes I make mistakes by using one’s syntax in another. To avoid this these IDE’s are really helpful– IntelliJ (Python) and Vscode (Javascript).

Below are the major differences that I came across:-

Python code uses tabs for a code block whereas JS uses { }
Python uses ‘#’ for comment while JS use ‘//’
Python uses the ‘print’ keyword whereas JS uses ‘console’ keyword to debug anything in the console panel.
Python Function uses a ‘def’ keyword to define function whereas JS uses ‘function’ keyword
The constructor of the Python class is defined by ‘__init__’ whereas JS uses a normal constructor.
The semicolon is not mandatory in both languages to define the end of a statement. But we use it in JS because if we don’t apply it, JS engine will apply it automatically and create unnecessary bugs in the code.

Approaches to implement task in different languages

Every language has its own beauty. While solving any task with NodeJS I need to think in a different way than implementing them in Python. In some scenarios, Python wins and in some NodeJS.

Just a small example-

To create a Task manager backend in NodeJS, I need to use Express. To replicate the same functionality in Python, I need to use Flask.

Checkout repo for basic task manager https://github.com/agarwalparas/task-manager

However, later on, if my backend needs a functionality of Machine learning to manage tasks and prioritize them on the basis of users’ behaviour, then I will surely use Python.

Whereas if my backend needs high speed to list tasks or to search from tasks or for faster real time updates of tasks within the team, then I will surely use NodeJS.

So, it is really tough to decide which language to use in which project. But it is fairly straightforward to say which language can be used for a particular task.

After some experience, I have figured out a way to decide which language is better for a project.

NodeJS for Chat Applications and Realtime Apps whereas Python for Analytics, Machine Learning, Command Line Utilities.

Some important concepts

F String in Python and Template literals in Javascript

The F string and template literals are great new ways to format strings. Not only are they more readable, more concise, and less prone to error than other ways of formatting, they are also faster!

Decorators in Python and Callback Function in Javascript

Decorators and Callback Function are very powerful and useful tools since they allow programmers to modify the behavior of function or class. In Decorators, functions are taken as the argument into another function and then called inside the wrapper function whereas in Javascript the function passed as argument is called callback function.

Async/Await in NodeJS

Before async/await, JS used promises but its code was a little complex to debug and caused callback problems.

Then JS introduced a neat syntax to work with promises in a more comfortable fashion. It’s called “async/await” and is relatively easy to understand and use.

Conclusion

So all in all it’s a very exciting journey. Both languages have some pros and cons. But isn’t it the same with everything. Different languages exist because there is no one-size fits all approach to programming. In fact, their existence gives us tools to help create more robust products. My experience of working with Python and JS simultaneously has helped me gain more exposure to the world of programming languages and I now look forward to learning more about other unknown languages.

An Introduction to Big Data Analytics| What It Is & How It Works?

Posted on 15/12/202015/06/2021 by gunjan

Big data is a term that describes datasets that are too large to be processed with the help of conventional tools and also is sometimes used to call a field of study that concerns those datasets. In this post, we will talk about the benefits of big data and how businesses can use it to succeed.

The six Vs of big data

Tourism Intelligence International – Big Data

Big data is often described with the help of six Vs. They allow us to better understand the nature of big data.

Volume

As it follows from the name, big data is used to refer to enormous amounts of information. We are talking about not gigabytes but terabytes ( 1,099,511,627,776 bytes) and petabytes (1,125,899,906,842,624 bytes) of data.

Velocity

Velocity means that big data should be processed fast, in a stream-like manner because it just keeps coming. For example, a single Jet engine generates more than 10 terabytes of data in 30 minutes of flight time. Now imagine how much data you would have to collect to research one small aero company. Data never stops growing, and every new day you have more information to process than yesterday. This is why working with big data is so complicated.

Variety

Big data is usually not homogeneous. For example, the data of an enterprise consists of its emails, documentation, support tickets, images, and photos, transaction records, etc. In order to derive any insights from this data, you need to classify and organize it first.

Value

The meaning that you extract from data using special tools must bring real value by serving a specific goal, be it improving customer experience or increasing sales. For example, data that can be used to analyze consumer behavior is valuable for your company because you can use the research results to make individualized offers.

Veracity

Veracity describes whether the data can be trusted. Hygiene of data in analytics is important because otherwise, you cannot guarantee the accuracy of your results.

Variability

Variability describes how fast and to what extent data under investigation is changing. This parameter is important because even small deviations in data can affect the results. If the variability is high, you will have to constantly check whether your conclusions are still valid.

Types of big data

Data analysts work with different types of big data:

Structured. If your data is structured, it means that it is already organized and convenient to work with. An example is data in Excel or SQL databases that is tagged in a standardized format and can be easily sorted, updated, and extracted.
Unstructured. Unstructured data does not have any pre-defined order. Google search results are an example of what unstructured data can look like: articles, e-books, videos, and images.
Semi-structured. Semi-structured data has been pre-processed but it doesn’t look like a ‘normal’ SQL database. It can contain some tags, such as data formats. JSON or XML files are examples of semi-structured data. Some tools for data analytics can work with them.
Quasi-structured. It is something in between unstructured and semi-structured data. An example is textual content with erratic data formats such as the information about what web pages a user visited and in what order.

Benefits of big data

Big data analytics allows you to look deeper into things.

Very often, important decisions in politics, production, or management are made based on personal opinions or unconfirmed facts. By analyzing data, you get objective insights into how things really are.

For example, big data analytics is now more and more widely used for rating employees for HR purposes. Imagine you want to make one of the managers a vice-president, but don’t know which to choose. Data analytics algorithms can analyze hundreds of parameters, such as when they start and finish their workday, what apps they use during the day, etc., to help you make this decision.

Big data analytics helps you to optimize your resources, perform better risk management, and be data-driven when setting business goals.

Big data challenges

Understanding big data is challenging. It seems that its possibilities are limitless, and, indeed, we have many great solutions that rely heavily on big data. A few of those are recommender systems on Netflix, YouTube, or Spotify that all of us know and love (or hate?). Often, we may not like their recommendations, but, in many cases, they are valuable.

Now let’s think about AI-systems that predict criminal behavior. They analyze profiles of criminals and regular people and can tell whether a person is likely at some point to commit a crime. These algorithms are reported to be quite effective.

However, their predictions are not as effective as to give them legal power, mostly because of the bias: algorithms are prone to make sexist or racist assumptions if the data is racist or sexist. You have probably heard about the first beauty contest judged by AI. None of the winners were black, probably, because the algorithm wasn’t trained on photos of black people. A similar fail happened with Google Photos that tagged two African-Americans as ‘gorillas’ ― for the same reason. This demonstrates how important the gender-race sensitivity perspective is when choosing data for analysis. We should improve not only the technology but also our way of thinking before we can create technologies that effectively ‘judge’ people.

How to use big data

How Brands Use Data - 5 Real World Examples | InfoClutch

If you want to benefit from the usage of big data, follow these steps:

Set a big data strategy

First, you need to set up a strategy. That means you need to identify what you want to achieve, for example, provide a better customer experience, improve sales, or improve your marketing strategy by learning more about the behavioral patterns of your clients. Your goal will define the tools and data you will use for your research.

Let’s say you want to study opinion polarity and brand awareness of your company. For that, you will conduct social analytics and process raw unstructured data from various social media and/or review websites like Facebook, Twitter, and Instagram. This type of analytics allows assessing brand awareness, measuring engagement, and seeing how word-of-mouth works for you.

In order to make the most out of your research, it is a good idea to assess the state of your company before analyzing. For example, you can collect the assumptions about your marketing strategy in social media and stats from different tools so that you can compare them with the results of your data-driven research and make conclusions.

Access and analyze the data

Once you have identified your goals and data sources, it is time to collect and analyze data. Very often, you have to preprocess it first so that machine learning algorithms could understand it.

By applying textual analysis, cluster analysis, predictive analytics, and other methods of data mining, you can extract valuable insights from the data.

Make data-driven decisions

Use what you have learned about your business or another area of study in practice. The data-driven approach is already adopted by many countries all around the world. Insights taken from data allow you to not miss important opportunities and manage your resources with maximum efficiency.

Big data use cases

Let us now see how big data is used to benefit real companies.

Product development

When you develop a new product, you can trust your guts or rely on statistics and numbers. P&G chose the second option and spends more than two billion dollars every year on R&D. They utilize big data as a springboard for new ideas. For example, they aggregate and filter external data, such as comments and news mentions, using Bayesian analysis on P&G’s product and brand data in real-time to develop new products and improve existing ones.

Predictive maintenance

Even a minor mistake or failure in the oil and gas industry can be lethal and cost millions of dollars. Predictive maintenance with the help of big data includes vibration analysis, oil analysis, and equipment observation. One of the providers of such software is Oracle. Their machine learning algorithms can analyze and optimize the use of high-value machinery that manufactures, transports, generates, or refines products.

Fraud and compliance

Digitalization of financial operations can prevent credit card theft, money laundering, and other such crimes. The USA Internal Revenue Service is one of the institutions that rely on processing massive amounts of transactions with the help of big data analytics to uncover fraudulent activities. They use neural network models with more than 600 different variables to be able to detect suspicious activities.

Last but not least

Big data is the technology that will continue to grow and develop. If you want to learn more about big data, machine learning, and artificial intelligence in research and business, follow us on Twitter and Medium and continue reading our blog.

Healthcare solutions with Agile software

Posted on 14/12/202014/05/2021 by Anteelo Master

Two things that define today’s startup ecosystem are innovation and speed to market. If you’ve a unique idea and if you can get to the market fast, before everyone else, chances are your product will be a success. I don’t imply that these are the only two things that matter. But they play an important role in defining product success.

While some may argue that innovation and speed to market don’t go hand in hand, I heartily disagree. Agile project management is one of the ways that allows innovation without compromising on the delivery timelines.

I have been an agile practitioner for nearly a decade now. I’ve worked in different kinds of projects with different SDLC methodologies. Among them, I find Agile to be one of the best methodologies for project development. Especially in managing those projects where new solutions are required to meet rapidly changing customer needs.

Let me share an example. One of our partners, Phritz, began their journey in December 2019. They started building a personal health record chatbot. At that time, they envisioned the chatbot to behave like a personal healthcare assistant that users can chat with anytime. The chatbot would even help users when they change doctors or health insurance.

However, as COVID-19 pandemic started spreading, we began to think of ways in which Phritz could offer extended support. There was a lot of hysteria among people regarding the information available about the virus. We began by thinking of ways to offer a feature in the chatbot where users could add their symptoms and the chatbot would offer answers. For instance, if you have a sore throat, the bot would give advice to take necessary medications. However, if you’ve sore throat, cold, and fever, the bot would suggest you to get a COVID test. If your test comes out to be positive, the bot also offers to inform people whom you’ve met in the past one week.

We couldn’t have imagined adding all these new features if we had chosen waterfall as a project development methodology.

Another example is from one of my recent projects. Our partners wanted to go for HIPAA compliance and secure all the protected health information (PHI) in the project. Securing PHI is an essential in a healthcare setup, so it’s critical to get this step right. This involved creating non-functional stories for securing PHI requirements, ensuring that it covers what has already been built and what will be built in upcoming features.

Since the stakeholders were in full gear with their marketing strategies and were getting the product familiar with the public, it was important for them to get the product to be HIPAA compliant faster.

With Agile, it was easier to accommodate this new requirement. In the Waterfall way, our stakeholders couldn’t have thought about implementing this until upon reaching the first milestone.

Implementing agile not only helped us in accommodating the PHI requirements but also helped us with process improvements and clear communication with stakeholders.

These examples show that agile development helps in incremental development of the product– one that conforms to the needs of the users and solves their problems.

Agile can be beneficial to implement in healthcare projects under the below scenarios as well–

When you’re not sure about the entire solution

All great products are built on ideas that first appear on a piece of paper. It’s not necessary to flesh out an idea completely before jumping in to develop it. Strategy and execution are important but getting to the market fast is more important.

In such cases, agile development helps in validating the idea. You can start with just a goal in mind. Something that’s specific and measurable. For example: “The claim management software will reduce the claims processing time by 70% and improve efficiency of providers by 90%”

Once you develop a solution to this problem, put it out in the market and get customers to use it. After they start using it, collect feedback from them and improve your solution as per their needs.

When you’re navigating a complex domain

The world of healthcare is constantly shifting and innovating. Therefore, if you’re in the race to build the best product, it would no longer help you win. Instead, you ought to focus on innovating in the services, and improving the customer experience of the product.

One of the best examples is Practo. Before the COVID-19 pandemic, Practo was subliminally known as an online consultation and medicine delivery platform. When the pandemic striked, they quickly pivoted as a telemedicine solution. Within a short span of four weeks, Practo created an ‘Artificial Intelligence’ tool that guided patients after collecting their basic information. The tool leveraged WHO protocols to profile high-risk people by asking them to share their travel and contact history.

This is just one of the many examples. In other healthcare products, you might be dealing with other regulatory guidelines like HIPAA. They make healthcare a complex domain. But with agile development, you can tackle them one at a time.

When there are multiple stakeholders/decision-makers

Healthcare product development might involve many stakeholders and decision makers. Each stakeholder might have a different perspective and goals for the product’s adoption in the market. This might cause a lot of feedback cycles that go in loops and a lot of incremental changes in the product’s features.

Agile teams are equipped to take up new changes, prioritize the needs of all stakeholders, and help you stay on track with rapidly changing requirements.

When you want to improve quality and reduce costs

In healthcare products, there is an unwavering focus on doing things quickly and shipping out features for the world to use and give feedback. Innovation matters the most, along with agility. But funding is limited and you can’t wrap yourself under the garb of innovation. Therefore, features must be rapidly tested. The focus is on failing fast and adapting to the users’ feedback.

In this scenario, agile proves to be the best method. The 2-week/4-week sprint works best in shipping out features that can be tested with the real users.

When product’s scope is variable

In the waterfall approach of product development, the scope of the project is fixed while team members and time can be varied. That is, if you’re halfway through a project when you realize you’re going to miss the timelines, then you either add more team members or extend the timelines. This increases the cost of development and causes delays in reaching the market.

One of the best things about agile development is that here time and people (team members) are fixed whereas the scope can vary as per requirements. It means that once the scope is defined, it’s not the dead-end of discovery.

If after the first sprint’s release you get feedback for adding/removing/improvising features, agile accommodates it. It might impact your final deadlines, but it would still be somewhat near to what you had planned.

Some other advantages of agile teams is that they are more capable of making day-to-day decisions, independently. With a defined and structured process, they can also thrive in different geographical areas.

However, the agile processes are not easy to imbibe. Ceremonies like backlog grooming, sprint planning, need a lot of discipline to execute. I learnt it on the job with the help of my leaders. If you’re a new product manager, I would suggest you to read some good books on agile project management. The Lean Startup by Eric Ries and Sprint: How to Solve Big Problems and Test New Ideas in Just Five Days are two of my favorite books that can help you get married to the idea of agile development.